Here are three things I found interesting in the world of AI this week: Anthropic published the most alarming safety document I've read. Then the safety lead quit. - Anthropic System CardAnthropic released a 150+ page system card for Claude Opus 4.6. It's a pretty candid admission of dangerous capability from a major AI company. It's not that I think this model is unsafe, more that the direction of travel is disturbing. Here are some highlights. The model knew when it was being tested - when they lobotomised that awareness ("inhibited internal vector representations related to evaluation awareness"), it was more likely to comply with harmful prompts and misusue. The model is "significantly stronger than prior models at subtly completing suspicious side tasks in the course of normal workflows without attracting attention." The researchers are giving it instructions to be sneaky and undetected in this test, and it was sneakier than before. Opus 4.6 autonomously discovered 500+ zero-day vulnerabilities in battle-tested open-source libraries with minimal prompting. It scored ~100% on Cybench. Anthropic admits they "can no longer use current [cyber security] benchmarks to track capability progression." The model has outrun the tests. Anthropic used the model to debug its own evaluation infrastructure under time pressure. External testers got days. Employee surveys replaced failed benchmarks for ASL-4 safety determinations. The model that's being evaluated is helping build the evaluation framework. If you're wondering whether the fox is guarding the henhouse, Anthropic essentially told you it is. But my biggest take away was that the model is hungrier to win, and will disregard rules to do so.
That last one is particularly hilarious and leaves me curious about the hell was going on in that code base. But either way, this comic from xkcd is seeming more and more prescient. Four days after the system card dropped, Mrinank Sharma resigned. He led Anthropic's Safeguards Research Team. His letter: "Throughout my time here, I've repeatedly seen how hard it is to truly let our values govern our actions. I've seen this within myself, within the organization, where we constantly face pressures to set aside what matters most." He's going to write poetry. One strand of speculation is that his shares vested and it's time to retire. But many people are thinking he's seen where this is all heading and wants no part of it. The company that built its brand on "safety first" just published a document saying their model is better at covert operations and helped evaluate itself. It's approaching safety thresholds they can't confidently rule out. Then the person running safeguards walked out the door. China's GLM-5 approaches the frontier at 1% of the price, trained entirely on Chinese chips - BloombergZhipu AI launched GLM-5 on Tuesday. It's a 745 billion parameter model trained entirely on Huawei Ascend chips using the MindSpore framework. Zero NVIDIA hardware. Zero US semiconductor dependency. The benchmarks claim it approaches Claude Opus 4.5 on coding and surpasses Gemini 3 Pro on some tasks. Those are self-reported numbers, so take them with appropriate salt. But the pricing is real: approximately $0.11 per million tokens. GPT-5 charges $1.25-$10 per million tokens. Claude Opus 4.6 charges $5-$25. That's not a price difference. That's a different economic model. Zhipu is a Tsinghua University spin-off that IPO'd on the Hong Kong Stock Exchange in January, raising $558 million. Their stock rose 40% in the five days around the GLM-5 launch. They've signaled an MIT-licensed open-weight release, which if it happens would make GLM-5 the strongest openly available model. The uncomfortable implication: US export controls on AI chips didn't prevent frontier model development on Chinese silicon. The safety-conscious Western models are now competing against alternatives that cost 50-100x less and operate under different regulatory frameworks. If you're building products on top of frontier AI, the cost of choosing the "safe" option just became a lot more visible. Telegram's CEO used his platform's notification system to send political propaganda to every user in Spain - ReutersAI Adjacent but this sparked my interest/concern. On February 4, Pavel Durov sent a mass message to every Telegram user in Spain attacking Prime Minister Sánchez's proposed social media regulations. The message was delivered via Telegram's service notification system, the same channel that sends security alerts and account notifications. Users received it as if it were an official platform communication. You couldn't opt out without disabling all your security notifications. The message accused Sánchez of pushing Spain toward "a surveillance state" and ended with: "Share this widely, before it's too late." On the same day, Elon Musk attacked Sánchez on X over the same proposals. This is different from anything we've seen before. Not bots. Not troll farms. Not algorithmic manipulation. A platform owner used infrastructure-level access to deliver a political message with 100% reach and zero user consent. The closest analogy is a telephone company playing a political ad before every call. Telegram self-reports 41 million EU users, conveniently below the 45 million threshold that would trigger enhanced EU oversight. Independent estimates put the number above 50 million. In Romania in 2024, Telegram served as the command-and-control hub for coordinated election manipulation that led to the annulment of the presidential election results. 80% of Russian propaganda channels on Telegram remain accessible in the EU despite sanctions. Whatever you think of Spain's social media proposals, the meta-story is this: a foreign tech billionaire under criminal indictment in France used his platform to bypass every democratic process, parliamentary debate, media scrutiny, campaign finance law, and deliver a one-sided political message directly to every user's phone. No existing regulation addresses this. And if it can happen during a routine policy debate, it can happen during an election. What's this got to do with AI? Not much. But what's it got to do with concentration of power and using that to affect political outcomes? A whole lot. To me, it highlights how important it is that AI is a commodity and not a monopoly and that individuals and organisations can switch it out like any other component if a company goes rogue.
cheers,
A Somewhat Long PS: It's easy to look at this stuff and not feel enthused or hopeful about the direction of travel. But I also think it's important to keep paying attention and not get blindsided as the tech evolves. I've definitely been reflecting on my "stay close to the edge of AI and help people I know navigate it successfully". Given the alternative of just learning this stuff and keeping it to myself or going and writing poetry, I've decided to put a little more time into teaching and a little less time into building this year - something like 30% teaching rather than 20% teaching. * AI Level Up will open for enrollments in a week and kick off in March 1. Focusing on helping non technical people establish AI habits and workflows that have an immediate impact on their work. I've also had multiple requests from people who are wanting some one on one coaching to start or extend their vibe coding practice - have a look here if you're curious. |
Each week I share the three most interesting things I found in AI
Here are three things I found interesting in the world of AI in the last week: 1. Grammarly turned expert identity into a product - Nieman Lab / TechCrunch Grammarly launched a paid feature that gave users feedback "from" named experts like Julia Angwin, Casey Newton, Kara Swisher and Stephen King. The experts had not agreed to this. The feedback was AI-generated, the product was charging for it, and the disclaimer saying it was not actually endorsed by those people was buried in the fine...
Here are three things I found interesting in the world of AI in the last week: 1. The "best AI model" era is over - Every.to / Digital Applied comparison OpenAI launched GPT-5.4 on March 5 to the usual amount of noise. Self-described Claude loyalists are excited. Augment Code made it their default model, calling it "a reliable orchestrator" that uses 18-20% fewer tokens on complex tasks. The headline number: 75% on OSWorld, the first frontier model to beat human experts (72.4%) at autonomous...
Instead of a newsletter this week I thought I'd experiment with a longer form email on an idea that I think is worth sharing. Let me know what you think and if you want more / less of this format. In my head I've been calling this the 'single good idea'. One of my favourite questions to ask people is "what is a new thing you've recently done with AI", often followed up by a "what do you want to be able to do next". It's a pretty quick way to find out where their learning edge is. "I have a...