Hi Reader, Sesame’s voices cross the uncanny valley, leaving users “freaked out” - Demo & Research I tried Sesame’s new voice model this week, and wow - it’s frankly unsettling in its realism. If you look at just one thing from this newsletter, check out the link above. These aren’t your standard robot voices; they laugh, stumble over words, take breaths, and are imbued with a whole lot of 'character'. Shopify CEO Tobi Lütke wasn’t exaggerating when he tweeted it’s “absolutely insane”. Sesame has strong backing and came out of stealth mode with this demo - it was founded by Oculus co-founder Brendan Iribe and has secured major venture capital backing from firms like Andreessen Horowitz and Spark Capital to develop not just voice AI but potentially AR glasses with voice assistant integration. Imagine walking around with a voice like that in your ear, commenting on what it can see through your smart glasses. Technically, what Sesame has accomplished is notable, especially for a model with only ~8.3B parameters. According to their research blog, they’re using two AI models working together (a backbone and a decoder) based on Meta’s LLaMA architecture, trained on roughly a million hours of audio. The team’s honest about its limitations, though - CEO Brendan Iribe admits it’s “too eager and often inappropriate in its tone,” doesn’t handle interruptions well, and can feel inconsistent. The voices are impressive, but the "brains" aren't - they're using Google's open weights Gemma model. Now imagine this tech paired with today's most powerful AI. Social media filter bubbles feel quaint compared to a voice assistant that sounds like your college roommate, reinforcing your beliefs while building emotional rapport. Voice phishing is just the appetizer in this potential buffet of manipulation. If you thought doom scrolling was addictive, wait till your AI friend is whispering personalized confirmation bias directly into your ear. Let alone to what happens if the company making the AI trains it with an agenda in mind. Oh yeah, it will be open-source soon under an Apache 2.0 license. So expect to see more AI talking like this soon. Manus AI: Claude Sonnet in disguise, but impressive nonetheless - Demo video So Monica, a Chinese startup, has officially unveiled Manus - supposedly the “first general AI agent.” Their demo is impressive, which combined with some shiny benchmarks and a hard to get on waitlist guarantees chorus of 'this changes everything' hype. Manus works asynchronously in the cloud, meaning you can close your laptop and come back to completed work. The demo shows it screening resumes, researching properties, and building interactive data visualizations without constant supervision. Tech researcher Jian Liao jailbroke Manus by asking for files at “/opt/.manus/” and got it to divulge system prompts etc. They also found evidence of just using Claude (Anthropic's strong model) and building on top of the popular browser use project which is all fairly standard in other agents. So probably not that much new under the hood, but a good job packaging it all together into something usable. I've seen a lot of commentary claiming this is the best agent experience people have had and a lot of calling this China’s “second DeepSeek moment.” There’s healthy skepticism though - Kyle Wiggers at TechCrunch argues the hype outpaces reality, noting in his tests that Manus struggled with complex multi-step tasks and often produced incorrect answers when retrieving information from the web. He views the performance as significantly behind OpenAI’s and Anthropic’s top agents. I reckon Manus is as much a story about what people want AI agents to be as it is about what they can do now. The excitement around it reveals a desire for AI that can truly work autonomously on our behalf - not just answer questions, but complete real tasks from start to finish. The gap between the current reality and the demo-driven hype is worth noting, but so is the direction it’s pointing us toward. I don't know about the time frame, but I think every knowledge worker will be delegating more and more tasks to AI each year. Alibaba’s QwQ model: the tiny model that thinks like a giant - Alibaba Cloud The AI race just got more interesting. Alibaba’s Qwen team dropped QwQ-32B last week - a model they’re calling a “compact reasoning specialist” that punches way above its weight. Despite having only 32 billion parameters (tiny compared to giants like DeepSeek r1’s 671B), QwQ-32b is reportedly matching or beating state-of-the-art models on complex reasoning tasks. It's like that scrappy little sibling who effortlessly solves your math homework and then casually switches to Chinese mid-sentence just to mess with you. Literally. Users report it switching to chinese mid sentence which I think is hilarious. Just don't ask it about sensitive political content I guess. Though, I was able to jail break the model on my laptop with a foot in the door approach and have a jolly old conversation about Tiananmen Square just fine - which is interesting when you think about what that means for censorship in China. In Alibaba’s internal tests, QwQ outperformed DeepSeek R1 on several benchmarks and even rivaled an OpenAI's “o1-mini” model. The specs are impressive: 131,072 token context window (that’s hundreds of pages of text), multilingual capabilities, and strong performance in math, coding, and logical reasoning. But perhaps the most significant aspect is that QwQ is fully open-weights under an Apache 2.0 license - meaning anyone can download, modify, and use it freely. This puts a genuinely powerful reasoning model in developers’ hands without API restrictions or costs. Wall Street certainly took notice - Alibaba’s stock jumped over 7% following the release, with analysts framing this as China’s answer to cutting-edge Western AI. It came amid Beijing’s renewed push to support next-gen tech development, suggesting China is serious about closing any AI gaps. Maybe that's what people mean when they say a "deep seek moment", AI news which impacts the stock market. cheers, JV |
Each week I share the three most interesting things I found in AI
Hi Reader, Here are three things I found interesting in the world of AI in the last week: Google feels the first crack in search dominance as Safari users drift to AI - testimony In a courtroom bombshell that sent $150 billion of Google’s market value up in smoke, Apple’s Eddy Cue casually mentioned that for the first time in over 20 years, Safari search volume has declined. Tech analysts immediately went into overdrive, frantically updating their valuation models. The 7.3% stock nosedive...
Hi Reader, Here are three things I found interesting in the world of AI in the last week: Figma Make brings AI “vibe-coding” to design workflows - official announcement Figma just launched "Make" - their AI-powered prototype generation tool that aims to deliver on the promise of converting designs and ideas into functional code. It lets designers transform their work into interactive prototypes via text prompts or convert existing Figma designs directly into working code. This is a meaningful...
Hi Reader, Here are three things I found interesting in the world of AI in the last week: LLMs secretly manipulated Reddit users’ opinions in an unauthorized experiment - article, follow-up Researchers from the University of Zurich just got caught running a massive unauthorized AI experiment on r/changemyview, where they unleashed AI bots that posted 1,783 comments over four months without anyone’s consent. The bots were programmed to be maximally persuasive by adopting fabricated identities...