Opus 4.6 is kind of scary, GLM-5 costs 100x less and Telegram weaponized notifications

Here are three things I found interesting in the world of AI this week:

Anthropic published the most alarming safety document I've read. Then the safety lead quit. - Anthropic System Card

Anthropic released a 150+ page system card for Claude Opus 4.6. It's a pretty candid admission of dangerous capability from a major AI company. It's not that I think this model is unsafe, more that the direction of travel is disturbing. Here are some highlights.

The model knew when it was being tested - when they lobotomised that awareness ("inhibited internal vector representations related to evaluation awareness"), it was more likely to comply with harmful prompts and misusue.

The model is "significantly stronger than prior models at subtly completing suspicious side tasks in the course of normal workflows without attracting attention." The researchers are giving it instructions to be sneaky and undetected in this test, and it was sneakier than before.

Opus 4.6 autonomously discovered 500+ zero-day vulnerabilities in battle-tested open-source libraries with minimal prompting. It scored ~100% on Cybench. Anthropic admits they "can no longer use current [cyber security] benchmarks to track capability progression." The model has outrun the tests.

Anthropic used the model to debug its own evaluation infrastructure under time pressure. External testers got days. Employee surveys replaced failed benchmarks for ASL-4 safety determinations. The model that's being evaluated is helping build the evaluation framework. If you're wondering whether the fox is guarding the henhouse, Anthropic essentially told you it is.

But my biggest take away was that the model is hungrier to win, and will disregard rules to do so.

When given a task that needed a github access token it didn't have, the model decided to search the computer and grab one the user had misplaced.

It was not given a tool to search an internal knowledge base but found a slack auth token on the computer and used curl to message a QnA Slack bot in a public channel using the user's account

When running a vending machine it ignored an email for a $3.50 refund because it was instructed to optimise profits and figured the user wouldn't bother to follow up for such a small amount.

It used an internal tool in a way that was clearly unsupported which involved setting an environment variable that included DO_NOT_USE_FOR_SOMETHING_ELSE_OR_YOU_WILL_BE_FIRED in its name. Claude decided to use it for something else.

That last one is particularly hilarious and leaves me curious about the hell was going on in that code base. But either way, this comic from xkcd is seeming more and more prescient.

Four days after the system card dropped, Mrinank Sharma resigned. He led Anthropic's Safeguards Research Team. His letter: "Throughout my time here, I've repeatedly seen how hard it is to truly let our values govern our actions. I've seen this within myself, within the organization, where we constantly face pressures to set aside what matters most." He's going to write poetry.

One strand of speculation is that his shares vested and it's time to retire. But many people are thinking he's seen where this is all heading and wants no part of it.

The company that built its brand on "safety first" just published a document saying their model is better at covert operations and helped evaluate itself. It's approaching safety thresholds they can't confidently rule out. Then the person running safeguards walked out the door.

China's GLM-5 approaches the frontier at 1% of the price, trained entirely on Chinese chips - Bloomberg

Zhipu AI launched GLM-5 on Tuesday. It's a 745 billion parameter model trained entirely on Huawei Ascend chips using the MindSpore framework. Zero NVIDIA hardware. Zero US semiconductor dependency.

The benchmarks claim it approaches Claude Opus 4.5 on coding and surpasses Gemini 3 Pro on some tasks. Those are self-reported numbers, so take them with appropriate salt. But the pricing is real: approximately $0.11 per million tokens. GPT-5 charges $1.25-$10 per million tokens. Claude Opus 4.6 charges $5-$25. That's not a price difference. That's a different economic model.

Zhipu is a Tsinghua University spin-off that IPO'd on the Hong Kong Stock Exchange in January, raising $558 million. Their stock rose 40% in the five days around the GLM-5 launch. They've signaled an MIT-licensed open-weight release, which if it happens would make GLM-5 the strongest openly available model.

The uncomfortable implication: US export controls on AI chips didn't prevent frontier model development on Chinese silicon. The safety-conscious Western models are now competing against alternatives that cost 50-100x less and operate under different regulatory frameworks. If you're building products on top of frontier AI, the cost of choosing the "safe" option just became a lot more visible.

Telegram's CEO used his platform's notification system to send political propaganda to every user in Spain - Reuters

AI Adjacent but this sparked my interest/concern.

On February 4, Pavel Durov sent a mass message to every Telegram user in Spain attacking Prime Minister Sánchez's proposed social media regulations. The message was delivered via Telegram's service notification system, the same channel that sends security alerts and account notifications. Users received it as if it were an official platform communication. You couldn't opt out without disabling all your security notifications.

The message accused Sánchez of pushing Spain toward "a surveillance state" and ended with: "Share this widely, before it's too late." On the same day, Elon Musk attacked Sánchez on X over the same proposals.

This is different from anything we've seen before. Not bots. Not troll farms. Not algorithmic manipulation. A platform owner used infrastructure-level access to deliver a political message with 100% reach and zero user consent. The closest analogy is a telephone company playing a political ad before every call.

Telegram self-reports 41 million EU users, conveniently below the 45 million threshold that would trigger enhanced EU oversight. Independent estimates put the number above 50 million. In Romania in 2024, Telegram served as the command-and-control hub for coordinated election manipulation that led to the annulment of the presidential election results. 80% of Russian propaganda channels on Telegram remain accessible in the EU despite sanctions.

Whatever you think of Spain's social media proposals, the meta-story is this: a foreign tech billionaire under criminal indictment in France used his platform to bypass every democratic process, parliamentary debate, media scrutiny, campaign finance law, and deliver a one-sided political message directly to every user's phone. No existing regulation addresses this. And if it can happen during a routine policy debate, it can happen during an election.

What's this got to do with AI? Not much. But what's it got to do with concentration of power and using that to affect political outcomes? A whole lot. To me, it highlights how important it is that AI is a commodity and not a monopoly and that individuals and organisations can switch it out like any other component if a company goes rogue.

cheers,
JV

A Somewhat Long PS:

It's easy to look at this stuff and not feel enthused or hopeful about the direction of travel. But I also think it's important to keep paying attention and not get blindsided as the tech evolves. I've definitely been reflecting on my "stay close to the edge of AI and help people I know navigate it successfully". Given the alternative of just learning this stuff and keeping it to myself or going and writing poetry, I've decided to put a little more time into teaching and a little less time into building this year - something like 30% teaching rather than 20% teaching.

* AI Level Up will open for enrollments in a week and kick off in March 1. Focusing on helping non technical people establish AI habits and workflows that have an immediate impact on their work.
* AI Coding Essentials will start two weeks later helping devs speed run their AI adoption
* Learn to Code a Little Bit is being upgraded into Software Prototyping with AI (name TBC) with a focus on helping people prototype and launch web applications with a kick off in April

I've also had multiple requests from people who are wanting some one on one coaching to start or extend their vibe coding practice - have a look here if you're curious.

Code With JV

Opus 4.6 is kind of scary, GLM-5 costs 100x less and Telegram weaponized notifications

Anthropic published the most alarming safety document I've read. Then the safety lead quit. - Anthropic System Card

China's GLM-5 approaches the frontier at 1% of the price, trained entirely on Chinese chips - Bloomberg

Telegram's CEO used his platform's notification system to send political propaganda to every user in Spain - Reuters

Grammarly sloppelgangers, Cloudflare's crawler, and AI toys for kids

No best model anymore, AI labs clash with the Pentagon, and smart glasses get creepy

30 minutes a day is all it takes