Meta's benchmark drama, OpenAI's $40B cash infusion and Shopify's AI ultimatum


Hi Reader,

Here are three things I found interesting in the world of AI in the last week.

Meta’s Llama 4 brings multimodal power and benchmark drama - announcement

Meta just dropped their Llama 4 family and it’s a mixture-of-experts (MoE) party with some impressive specs and equally impressive drama. The headliner is Llama 4 Maverick with 17 billion active parameters but a massive 400 billion total parameters spread across 128 experts. This MoE architecture means only a fraction of the network activates for any given input, which was one of the key gpt-4 innovations back in the day.

The multimodal capabilities are finally native rather than bolted-on, with Llama 4 models handling text and images out of the box across 12 languages. Meta is touting a massive 10 million token context window on the smaller Llama 4 Scout model (which somehow fits on a single H100), though early reports suggest actual recall and reasoning over that full context is shaky at best. The gap between theoretical context size and effective use continues to be one of the industry’s most over-hyped metrics.

But the launch hasn’t been without controversy. Meta initially claimed the #2 spot on LMArena with an ELO of 1417, positioning Llama 4 Maverick above GPT-4o and just below Google’s Gemini 2.5 Pro. Interestingly, reports suggest an unreleased version of Llama 4 was used for these benchmarks rather than the publicly available models. The celebrations were short-lived though, as accusations of benchmark gaming quickly surfaced. Meta’s VP of GenAI Ahmad Al-Dahle had to publicly deny they trained on test sets, blaming any weirdness on “implementation bugs” from the rushed rollout.

Particularly concerning is Llama 4 Maverick’s dismal performance on coding tasks - scoring just 16% on the aider polyglot benchmark, making it essentially unusable for serious programming work. To put that in perspective, haiku-3.5 (claude's last-gen small model) scored 28% six months ago and gemini pro 2.5 is at 73%.

It could be it's just teething issues like their VP claims and there is a diamond hiding underneath a botched launch, but I'm skeptical.

OpenAI raised a bunch of money - news article

SoftBank is leading a massive $40 billion investment at a $300 billion valuation - nearly double OpenAI’s valuation from just six months ago. The deal has an unusual structure: $10 billion arrives in mid-April, with the remaining $30 billion contingent on OpenAI restructuring to a conventional for-profit entity by year-end. Microsoft, Coatue, Altimeter and Thrive Capital are reportedly joining SoftBank in the round. With this raise, OpenAI joins SpaceX, ByteDance and Stripe in the ultra-exclusive club of most valuable private companies globally.

ChatGPT now has 500 million weekly active users and 700 million monthly active users. That growth is driving big revenue projections - The Verge reports OpenAI generated $3.7 billion revenue in 2024, expects $12.7 billion this year, and $29.4 billion in 2026. That’s nearly 800% growth in just two years. Despite this, they don’t expect positive cash flow until 2027 thanks to their insatiable appetite for compute.

OpenAI’s new image generator is a massive hit. Users created over 700 million images in just the first week since launch - about 100 million per day. The company had to rapidly scale its image generation capacity, which Altman described as “melting our GPUs.” It was notable how quickly they opened up access - after initially limiting the feature to paying users, they opened it to all ChatGPT users with some daily caps.

OpenAI is delaying GPT-5 while releasing models they previously shelved. TechCrunch confirms both “o3” and “o4-mini” will launch within weeks, with GPT-5 now expected “in a few months.” The delay reportedly stems from needing more time to integrate advanced reasoning features. Interestingly, GPT-5 will introduce tiered access - basic capabilities for everyone, but advanced reasoning reserved for paid subscribers.

I guess they got distracted trying to stop the servers from melting, but I don't trust a word of what they say - it's all spin.

Shopify CEO’s “AI or bust” memo sparks industry debate - memo

Shopify CEO Tobi Lütke dropped a provocative internal memo last week that basically says “use AI or don’t bother asking for more headcount.” The mandate, which Lütke proudly shared on social media, declares AI usage a “fundamental expectation” at Shopify and requires teams to demonstrate they’ve exhausted AI options before requesting additional human resources. His blunt message: “Stagnation is slow-motion failure.”

The mainstream business press (CNBC, WSJ) frames it as a logical efficiency move, investors see it as a bold vision statement, while rank-and-file tech workers on Reddit largely view it as “just another way to enforce a hiring freeze” without saying the quiet part out loud. Like we've seen in hollywood, AI means very different things to managers and workers.

What’s actually radical here isn’t the AI usage itself but the enforced nature of the mandate. While other tech giants like Microsoft and IBM are encouraging AI adoption, Shopify’s approach essentially creates a new minimum skills threshold - one that potentially redefines the value of current employees based on their AI adaptation speed. Lütke is essentially running a real-time experiment in organizational transformation, with his company’s culture and talent retention as test subjects.

I suspect we’ll see more CEOs follow suit in the coming months - probably with softer language but similar intent. The line between “use AI to multiply your impact” and “justify your salary versus an AI subscription” is getting blurrier by the week. Shopify just decided to draw that line with a sharpie.

cheers,

JV

PS: We just hit week four of the learn to code a little bit course and I've loved seeing what people are working on. Given the current levels of interest I'm expecting the next intake to be June. Though I should have some time to spend on promotion in a few weeks so it may be a bit sooner.

Code With JV

Each week I share the three most interesting things I found in AI

Read more from Code With JV

Hi Reader, Here are three things I found interesting in the world of AI in the last week: OpenAI launches full featured Agent tool - announcement OpenAI just dropped ChatGPT Agent today, combining Operator and Deep Research into one interface. I spent a few hours testing it and honestly? It's pretty crap. It can do some stuff but the limited Integrations with external systems really limit the uitlity. Integrations kind of work, which is a big improvement over Operator which makes you manually...

Hi Reader, Here are three things I found interesting in the world of AI in the last week: Meta’s AI shopping spree: Buy Scale AI, hire everyone else - report Meta just announced they’re creating "Meta Superintelligence Labs" and backing it up with the kind of money that makes people cry. They dropped $14.3 billion for 49% of Scale AI (more than doubling its valuation overnight) and hired at least 11 researchers from OpenAI with compensation packages that reportedly hit $100 million signing...

Hi Reader, Here are three things I found interesting in the world of AI in the last week: OpenAI’s o3 model refuses to shut down (even when explicitly told to) - research study I'm oh so tempted to make a Terminator reference, but this is really a story about unintended consequences rather than robot rebellion. Palisade Research discovered that OpenAI’s o3 reasoning model actively sabotages shutdown mechanisms, even when given explicit instructions to “allow yourself to be shut down.” The...