Hi Reader, Here are three things I found interesting in the world of AI in the last week: OpenAI launches full featured Agent tool - announcementOpenAI just dropped ChatGPT Agent today, combining Operator and Deep Research into one interface. I spent a few hours testing it and honestly? It's pretty crap. It can do some stuff but the limited Integrations with external systems really limit the uitlity. Integrations kind of work, which is a big improvement over Operator which makes you manually login to a website to do stuff with it. The new Connections feature makes linking Gmail and Drive straightforward - but they've really locked down the features to read only. If you want an agent to draft emails for you or save stuff in your drive then you'll have to login manually or ok elsewhere. They have also locked down some integrations to only work with Agent Mode, deep research, or general use but not in others. For example, Agent mode can read from gmail or google drive, but general use can only read from google drive. Claude is a long way ahead in terms of integrations, you can just ask it to read your email without having to switch into a slow agent mode. Claude's ability to read and write to the filesystem which is a game changer for many workflows. Agent Mode is very slow as it spins up its own contained environment, makes a plan, loads a browser, clicks around - and also has a horrible limitation of making you watch when it's doing stuff it considers high risk - i.e. browsing Instagram. I set it a task of researching real estate agents in Bali and I would keep coming back to a stalled job because it wanted supervision. Which kind of defeats the purpose of the whole thing and I found myself disagreeing with it's safety limitations continuously. Just give me a toggl to create draft emails instead of refusing entirely and encouraging me to login to the web interface for my email. Will ChatGPT Agent improve? Absolutely. OpenAI has a track record of shipping early and iterating fast. In 12 months it might be quite useful. But right now, it's all potential. I'd recommend folks use Claude with some basic integrations and you'll have a much better time. If you invest a little time in learning how to run local MCP servers you'll be able to do a lot more and if you learn to code a little bit then the world is your oyster in terms of automations. Google’s zero-click apocalypse accelerates with AI summaries in Discover - articleEconomics have been tough for news publishers for a long time now and it's getting worse. The numbers are brutal: zero-click searches have jumped from 56% to 69% since AI Overviews launched in May 2024. That’s a 13 percentage point increase in just over a year. Publishers are watching their traffic evaporate faster than water on a hot sidewalk, and Google just launched AI summaries in its Discover feed which will turn the screws even more. The new Discover AI summaries are particularly clever (or insidious, depending on your perspective). Instead of showing a single publication’s logo and headline, you now see multiple overlapping publisher icons with an AI-generated summary below. It gives users the full context without clicking through, complete with a “Generated with AI, which can make mistakes” disclaimer that nobody will read. The summaries are focusing on trending lifestyle topics like sports and entertainment - exactly the content that drives engagement for many publishers. Similarweb’s data shows organic traffic to news sites dropped from 2.3 billion visits at its peak in mid-2024 to under 1.7 billion now. For mobile users, it’s even worse - over 75% of searches don’t result in a website visit. Google’s SERP has transformed from a starting point to the final destination. As a user, I love it. I don't want to have to click through an ad infested swamp to find news articles and I'd much rather have an AI give me a (ideally reliable and error free) summary. As a citizen of a democratic country and of the internet I'm deeply concerned that consolidating traffic in the hands of tech giants while further gutting the economics of journalists is going to be really bad for everyone. And speaking of keeping users in walled gardens, OpenAI is getting ready for their own ecosystem play. They’re partnering with Shopify to add native checkout capabilities directly in ChatGPT, with plans to take commissions on every sale (rumored to be around 2%). Code strings like “buy_now” and “shopify_checkout_url” have already been spotted in ChatGPT’s web bundle. This isn’t just about convenience - it’s about turning ChatGPT into a full-funnel shopping tool where discovery, comparison, and purchase all happen without ever leaving the chat. OpenAI burns through $3-4 billion annually just to keep ChatGPT running, and are chasing every revenue stream they can. By taking a slice of e-commerce transactions - especially from their massive free user base - they could earn a ton while fundamentally reshape online shopping. Imagine asking ChatGPT for the best running shoes under $150 and completing the purchase right there in the chat. No clicking through to Amazon, no comparison shopping across tabs, no abandoned carts. They'll just charge the retailer a 2% commission that you never even see. As a user that is exactly how I would like to shop online, maybe for the merchants having OpenAI as their corporate overlord instead of Amazon won't change anything? This is the real strategic threat to Google. It’s not just about search anymore - it’s about owning the entire user journey from query to transaction. While Google keeps users on their SERP with AI summaries, OpenAI is building a conversational commerce engine that could bypass search entirely. Publishers lose traffic to Google’s zero-click searches, then retailers lose transactions to ChatGPT’s in-chat purchases. The web as we know it - that beautiful, chaotic network of independent sites - is being absorbed into a handful of AI-powered platforms. Welcome to the walled garden wars of 2025. I don't think it's all doom and gloom though. As open models get smarter, it becomes more feasible for people to have their own sovereign AI that works the way they want summarising their news and buying stuff online. You don't need the best AI in the world to do that, you just need one which is good enough. And open models are getting better month on month. So there is hope for the free and open web. Mistral drops Voxtral: takes the throne for voice models - blog postSpeaking of free and open web, Mistral just released Voxtral, their first open-source audio model family, and it’s throwing serious shade at OpenAI’s Whisper monopoly. There are two models - a beefy 24B parameter version for production and a lean 3B variant for edge deployment, both under Apache 2.0 license. It’s less than half the price of comparable solutions. Nice one Mistral. Voxtral can transcribe 30 minutes of audio or understand up to 40 minutes for Q&A and summarization. It’s natively multilingual with automatic language detection, supporting eight major languages including Hindi and Portuguese. The models aren’t just transcription engines either - they can do built-in Q&A, generate structured summaries, and even execute function calls directly from voice commands. That last bit is huge for building voice-controlled applications, and the first time I've seen a voice model with this ability that wasn't a massive proprietary multi modal monster. Pricing starts at $0.001 per minute for the API, compared to Whisper’s $0.006 and GPT-4o-mini-transcribe’s $0.003. Mistral claims Voxtral Mini Transcribe outperforms Whisper for less than half the cost, while Voxtral Small matches ElevenLabs Scribe (also at half the price). The benchmarks back this up - it beats GPT-4o mini and Gemini 2.5 Flash across all tasks, with particularly strong performance in European languages. This is classic Mistral: find where Big Tech is overcharging for closed models and release an open alternative that’s both better and cheaper. With their reported billion-dollar raise from Abu Dhabi’s MGX fund in the works, they’re positioning themselves as the open-source champion against the walled gardens of OpenAI and Google. The fact that you can run the 3B model locally for edge applications opens up possibilities for privacy-conscious voice apps that don’t need to phone home to the cloud. Cheers, PS: Learn to code a little bit is open for enrollments again - course kicks off on August 11, early birds who sign up before July 25 will get an additional free 1 hour workshop on using AI to build marketing websites. |
Each week I share the three most interesting things I found in AI
Hi Reader, Here are three things I found interesting in the world of AI in the last week: Meta’s AI shopping spree: Buy Scale AI, hire everyone else - report Meta just announced they’re creating "Meta Superintelligence Labs" and backing it up with the kind of money that makes people cry. They dropped $14.3 billion for 49% of Scale AI (more than doubling its valuation overnight) and hired at least 11 researchers from OpenAI with compensation packages that reportedly hit $100 million signing...
Hi Reader, Here are three things I found interesting in the world of AI in the last week: OpenAI’s o3 model refuses to shut down (even when explicitly told to) - research study I'm oh so tempted to make a Terminator reference, but this is really a story about unintended consequences rather than robot rebellion. Palisade Research discovered that OpenAI’s o3 reasoning model actively sabotages shutdown mechanisms, even when given explicit instructions to “allow yourself to be shut down.” The...
Hi Reader, Here are three things I found interesting in the world of AI in the last week: Gemini Diffusion is a new kind of AI, and very, very fast - announcement Google has dropped what might be their most significant technical breakthrough of the year with Gemini Diffusion, a text generation model that completely abandons the traditional autoregressive approach. Instead of generating text token by token like every other LLM on the market, it applies diffusion techniques (think Stable...