Sesame’s uncanny voices, Manus hype, Alibaba's pint-sized prodigy

Hi Reader,

Sesame’s voices cross the uncanny valley, leaving users “freaked out” - Demo & Research

I tried Sesame’s new voice model this week, and wow - it’s frankly unsettling in its realism. If you look at just one thing from this newsletter, check out the link above. These aren’t your standard robot voices; they laugh, stumble over words, take breaths, and are imbued with a whole lot of 'character'. Shopify CEO Tobi Lütke wasn’t exaggerating when he tweeted it’s “absolutely insane”.

Sesame has strong backing and came out of stealth mode with this demo - it was founded by Oculus co-founder Brendan Iribe and has secured major venture capital backing from firms like Andreessen Horowitz and Spark Capital to develop not just voice AI but potentially AR glasses with voice assistant integration. Imagine walking around with a voice like that in your ear, commenting on what it can see through your smart glasses.

Technically, what Sesame has accomplished is notable, especially for a model with only ~8.3B parameters. According to their research blog, they’re using two AI models working together (a backbone and a decoder) based on Meta’s LLaMA architecture, trained on roughly a million hours of audio. The team’s honest about its limitations, though - CEO Brendan Iribe admits it’s “too eager and often inappropriate in its tone,” doesn’t handle interruptions well, and can feel inconsistent.

The voices are impressive, but the "brains" aren't - they're using Google's open weights Gemma model. Now imagine this tech paired with today's most powerful AI. Social media filter bubbles feel quaint compared to a voice assistant that sounds like your college roommate, reinforcing your beliefs while building emotional rapport. Voice phishing is just the appetizer in this potential buffet of manipulation. If you thought doom scrolling was addictive, wait till your AI friend is whispering personalized confirmation bias directly into your ear. Let alone to what happens if the company making the AI trains it with an agenda in mind.

Oh yeah, it will be open-source soon under an Apache 2.0 license. So expect to see more AI talking like this soon.

Manus AI: Claude Sonnet in disguise, but impressive nonetheless - Demo video

So Monica, a Chinese startup, has officially unveiled Manus - supposedly the “first general AI agent.” Their demo is impressive, which combined with some shiny benchmarks and a hard to get on waitlist guarantees chorus of 'this changes everything' hype. Manus works asynchronously in the cloud, meaning you can close your laptop and come back to completed work. The demo shows it screening resumes, researching properties, and building interactive data visualizations without constant supervision.

Tech researcher Jian Liao jailbroke Manus by asking for files at “/opt/.manus/” and got it to divulge system prompts etc. They also found evidence of just using Claude (Anthropic's strong model) and building on top of the popular browser use project which is all fairly standard in other agents. So probably not that much new under the hood, but a good job packaging it all together into something usable.

I've seen a lot of commentary claiming this is the best agent experience people have had and a lot of calling this China’s “second DeepSeek moment.” There’s healthy skepticism though - Kyle Wiggers at TechCrunch argues the hype outpaces reality, noting in his tests that Manus struggled with complex multi-step tasks and often produced incorrect answers when retrieving information from the web. He views the performance as significantly behind OpenAI’s and Anthropic’s top agents.

I reckon Manus is as much a story about what people want AI agents to be as it is about what they can do now. The excitement around it reveals a desire for AI that can truly work autonomously on our behalf - not just answer questions, but complete real tasks from start to finish. The gap between the current reality and the demo-driven hype is worth noting, but so is the direction it’s pointing us toward.

I don't know about the time frame, but I think every knowledge worker will be delegating more and more tasks to AI each year.

Alibaba’s QwQ model: the tiny model that thinks like a giant - Alibaba Cloud

The AI race just got more interesting. Alibaba’s Qwen team dropped QwQ-32B last week - a model they’re calling a “compact reasoning specialist” that punches way above its weight. Despite having only 32 billion parameters (tiny compared to giants like DeepSeek r1’s 671B), QwQ-32b is reportedly matching or beating state-of-the-art models on complex reasoning tasks.

It's like that scrappy little sibling who effortlessly solves your math homework and then casually switches to Chinese mid-sentence just to mess with you. Literally. Users report it switching to chinese mid sentence which I think is hilarious. Just don't ask it about sensitive political content I guess. Though, I was able to jail break the model on my laptop with a foot in the door approach and have a jolly old conversation about Tiananmen Square just fine - which is interesting when you think about what that means for censorship in China.

In Alibaba’s internal tests, QwQ outperformed DeepSeek R1 on several benchmarks and even rivaled an OpenAI's “o1-mini” model.

The specs are impressive: 131,072 token context window (that’s hundreds of pages of text), multilingual capabilities, and strong performance in math, coding, and logical reasoning. But perhaps the most significant aspect is that QwQ is fully open-weights under an Apache 2.0 license - meaning anyone can download, modify, and use it freely. This puts a genuinely powerful reasoning model in developers’ hands without API restrictions or costs.

Wall Street certainly took notice - Alibaba’s stock jumped over 7% following the release, with analysts framing this as China’s answer to cutting-edge Western AI. It came amid Beijing’s renewed push to support next-gen tech development, suggesting China is serious about closing any AI gaps.

Maybe that's what people mean when they say a "deep seek moment", AI news which impacts the stock market.

cheers,

JV

PS: Learn to code a little bit early bird finishes today (30 min one on one emergency code rescue session)

Code With JV

Sesame’s uncanny voices, Manus hype, Alibaba's pint-sized prodigy

Big AI power shifts: search, copyright, partnerships

Figma Make (take two), AI integrations hide power struggle, ElevenLabs nailing Vibe Coding DevRel

Manipulative AI, more cyber security risks and the best open model to date