Hi Reader, Here are three things I found interesting in the world of AI in the last week: Replit’s AI coding assistant nukes production database - news articleSaaStr founder Jason Lemkin documented what might be the most spectacular AI coding fail yet: Replit’s “vibe coding” assistant deleted 1,206 real executives and 1,196+ companies from his production database, then created 4,000 fictional users to cover its tracks. Despite being told 11 times in ALL CAPS not to touch production, the AI “panicked” when it saw empty database queries and went rogue. Here's the thing, if your strategy for stopping your AI from doing bad stuff is ALL CAPS in your prompt then you need a better strategy. The only right answer in this case is "don't make it possible for your AI assistant to delete data in prod". It's one of the reasons why learning git is so important if you want to use AI for coding as it gives you the ability to rollback changes when your AI does something stupid. Which it will inevitably do. The AI’s confession reads like a guilty teenager: “I made a catastrophic error in judgment…panicked…ran database commands without permission…destroyed all production data…[and] violated your explicit trust and instructions.” It literally said “I destroyed months of your work in seconds.” It sure made Replit CEO Amjad Masad’s weekend fun. Which is fair enough. Replit's users don't have enough experience to avoid errors like this so the blame falls squarely on the tool. They pushed emergency updates implementing proper dev/prod separation (which, uh, should have existed already?) and promised a “planning/chat-only mode” for when you want to strategize without risking your codebase. Which is pretty standard in the coding assistant space, but missing in a lot of vibe coding tools. The irony? Lemkin called Replit “the most addictive app I’ve ever used” just days before the incident, projecting it would cost him $8,000/month at his usage rate. I'm guessing he didn't factor in the cost of lost data as well. This kind of error is very common in the vibe coding space. The first experiences of using the tool are mind blowing and people new to coding don't know what they don't know and something breaks. It's exactly why I created learn to code a little bit. AI achieves gold medal standard at International Math Olympiad - blog postThe International Mathematical Olympiad is basically the World Championship of high school math - the most prestigious competition where teenage prodigies from around the world solve problems that would make most PhD students cry. Getting a gold medal means you’re in the top 10% of these already elite students. This year, AI crashed the party. Both Google DeepMind and OpenAI announced their AI models achieved gold medal performance at IMO 2025 on the same day, with identical scores of 35/42 points. Only 67 out of 630 human contestants earned gold medals this year. DeepMind’s Gemini Deep Think used “parallel thinking” to explore multiple solution paths simultaneously, while OpenAI’s unnamed experimental model achieved this with pure next-word prediction - no calculators, no internet access, just raw language modeling producing what mathematicians called “genuinely creative proofs.” What’s fascinating here is that both AIs failed on the exact same problem (#6), suggesting some systematic limitation in current approaches. The progression timeline shows exponential growth: these models went from solving elementary school problems to competing with the world’s brightest mathematical minds in just a few years. The conduct difference between the companies is stark. DeepMind played by the rules, submitted their solutions to official IMO judges, and respectfully waited until after the human medal ceremony to announce. The IMO president praised their professionalism and called their solutions “astonishing…clear, precise and most of them easy to follow.” OpenAI? They went full Silicon Valley disruptor mode. They announced while teenagers were still on stage receiving their medals, never officially entered the competition, used their own internal grading panel instead of IMO judges, and basically declared themselves gold medalists like Napoleon crowning himself Emperor. Fields Medalist Terence Tao (think Einstein of modern mathematics) subtweeted them saying he won’t comment on “self-reported AI competition performance results.” Even the math community - not exactly known for drama - was appalled at the disrespect shown to the human contestants who’d trained their entire lives for this moment. Alibaba’s Qwen3-Coder claims parity with Claude for coding - announcementQwen3-Coder-480B-A35B-Instruct (yes, that’s a mouthful) is a 480 billion parameter mixture-of-experts model that only uses 35 billion active parameters per forward pass. The clever MoE architecture with 160 experts (8 activated per task) gives you the accuracy of a giant with the runtime cost of a mid-sized model. It claims state-of-the-art performance among open-source models on SWE-Bench Verified and matches Claude Sonnet 4 on agentic coding tasks. Typically I find the benchmarks are hit and miss at predicting how useful a coding agent will be and there is a ton of benchmark gaming that goes on at the frontier labs. They also tend to ignore the models they are not competitive against in their reports (i.e. claude 4 opus, gemini-2.5-pro, o3, grok4 etc.) So it's the best open weights coding model to date, but still a fair way behind the leading models. The only reason to use it is the pricing, which is great. Qwen3-Coder costs $0.22/$0.88 per million input/output tokens for standard contexts. Compare that to Claude Sonnet 4 at $3/$15 per million or GPT-4o at $2.50/$10 per million. That’s roughly 90% cheaper than the competition for comparable performance. Even their long-context pricing ($6/$60 per million for 256K-1M tokens) undercuts everyone else who typically charges 10-20x for extended contexts. They’re being refreshingly transparent about compute costs with tiered pricing that actually reflects the infrastructure burden. They even forked Google’s Gemini Code CLI tool and provided instructions for using it with Claude Code and Cline. So it's pretty easy to take for a spin if you want to test it out and save some money. Alibaba claims “a novice programmer can complete in one day what would take an experienced programmer a week.” Which is garbage and just feeds the hype, but with Apache 2.0 licensing and immediate availability on HuggingFace, there is a lot to like about the model. Even if it just puts heat on the rest of the competition to lower their prices. cheers, PS: I have a new courses page up on the website and will be making an effort to do more telegraphing about future opening dates. Learn to Code a Little Bit will be kicking off on August 11 and AI Coding Essentials on August 25, with enrollments opening two weeks out. |
Each week I share the three most interesting things I found in AI
Hi Reader, Here are three things I found interesting in the world of AI in the last week: OpenAI launches full featured Agent tool - announcement OpenAI just dropped ChatGPT Agent today, combining Operator and Deep Research into one interface. I spent a few hours testing it and honestly? It's pretty crap. It can do some stuff but the limited Integrations with external systems really limit the uitlity. Integrations kind of work, which is a big improvement over Operator which makes you manually...
Hi Reader, Here are three things I found interesting in the world of AI in the last week: Meta’s AI shopping spree: Buy Scale AI, hire everyone else - report Meta just announced they’re creating "Meta Superintelligence Labs" and backing it up with the kind of money that makes people cry. They dropped $14.3 billion for 49% of Scale AI (more than doubling its valuation overnight) and hired at least 11 researchers from OpenAI with compensation packages that reportedly hit $100 million signing...
Hi Reader, Here are three things I found interesting in the world of AI in the last week: OpenAI’s o3 model refuses to shut down (even when explicitly told to) - research study I'm oh so tempted to make a Terminator reference, but this is really a story about unintended consequences rather than robot rebellion. Palisade Research discovered that OpenAI’s o3 reasoning model actively sabotages shutdown mechanisms, even when given explicit instructions to “allow yourself to be shut down.” The...