Meta's benchmark drama, OpenAI's $40B cash infusion and Shopify's AI ultimatum

Hi Reader,

Here are three things I found interesting in the world of AI in the last week.

Meta’s Llama 4 brings multimodal power and benchmark drama - announcement

Meta just dropped their Llama 4 family and it’s a mixture-of-experts (MoE) party with some impressive specs and equally impressive drama. The headliner is Llama 4 Maverick with 17 billion active parameters but a massive 400 billion total parameters spread across 128 experts. This MoE architecture means only a fraction of the network activates for any given input, which was one of the key gpt-4 innovations back in the day.

The multimodal capabilities are finally native rather than bolted-on, with Llama 4 models handling text and images out of the box across 12 languages. Meta is touting a massive 10 million token context window on the smaller Llama 4 Scout model (which somehow fits on a single H100), though early reports suggest actual recall and reasoning over that full context is shaky at best. The gap between theoretical context size and effective use continues to be one of the industry’s most over-hyped metrics.

But the launch hasn’t been without controversy. Meta initially claimed the #2 spot on LMArena with an ELO of 1417, positioning Llama 4 Maverick above GPT-4o and just below Google’s Gemini 2.5 Pro. Interestingly, reports suggest an unreleased version of Llama 4 was used for these benchmarks rather than the publicly available models. The celebrations were short-lived though, as accusations of benchmark gaming quickly surfaced. Meta’s VP of GenAI Ahmad Al-Dahle had to publicly deny they trained on test sets, blaming any weirdness on “implementation bugs” from the rushed rollout.

Particularly concerning is Llama 4 Maverick’s dismal performance on coding tasks - scoring just 16% on the aider polyglot benchmark, making it essentially unusable for serious programming work. To put that in perspective, haiku-3.5 (claude's last-gen small model) scored 28% six months ago and gemini pro 2.5 is at 73%.

It could be it's just teething issues like their VP claims and there is a diamond hiding underneath a botched launch, but I'm skeptical.

OpenAI raised a bunch of money - news article

SoftBank is leading a massive $40 billion investment at a $300 billion valuation - nearly double OpenAI’s valuation from just six months ago. The deal has an unusual structure: $10 billion arrives in mid-April, with the remaining $30 billion contingent on OpenAI restructuring to a conventional for-profit entity by year-end. Microsoft, Coatue, Altimeter and Thrive Capital are reportedly joining SoftBank in the round. With this raise, OpenAI joins SpaceX, ByteDance and Stripe in the ultra-exclusive club of most valuable private companies globally.

ChatGPT now has 500 million weekly active users and 700 million monthly active users. That growth is driving big revenue projections - The Verge reports OpenAI generated $3.7 billion revenue in 2024, expects $12.7 billion this year, and $29.4 billion in 2026. That’s nearly 800% growth in just two years. Despite this, they don’t expect positive cash flow until 2027 thanks to their insatiable appetite for compute.

OpenAI’s new image generator is a massive hit. Users created over 700 million images in just the first week since launch - about 100 million per day. The company had to rapidly scale its image generation capacity, which Altman described as “melting our GPUs.” It was notable how quickly they opened up access - after initially limiting the feature to paying users, they opened it to all ChatGPT users with some daily caps.

OpenAI is delaying GPT-5 while releasing models they previously shelved. TechCrunch confirms both “o3” and “o4-mini” will launch within weeks, with GPT-5 now expected “in a few months.” The delay reportedly stems from needing more time to integrate advanced reasoning features. Interestingly, GPT-5 will introduce tiered access - basic capabilities for everyone, but advanced reasoning reserved for paid subscribers.

I guess they got distracted trying to stop the servers from melting, but I don't trust a word of what they say - it's all spin.

Shopify CEO’s “AI or bust” memo sparks industry debate - memo

Shopify CEO Tobi Lütke dropped a provocative internal memo last week that basically says “use AI or don’t bother asking for more headcount.” The mandate, which Lütke proudly shared on social media, declares AI usage a “fundamental expectation” at Shopify and requires teams to demonstrate they’ve exhausted AI options before requesting additional human resources. His blunt message: “Stagnation is slow-motion failure.”

The mainstream business press (CNBC, WSJ) frames it as a logical efficiency move, investors see it as a bold vision statement, while rank-and-file tech workers on Reddit largely view it as “just another way to enforce a hiring freeze” without saying the quiet part out loud. Like we've seen in hollywood, AI means very different things to managers and workers.

What’s actually radical here isn’t the AI usage itself but the enforced nature of the mandate. While other tech giants like Microsoft and IBM are encouraging AI adoption, Shopify’s approach essentially creates a new minimum skills threshold - one that potentially redefines the value of current employees based on their AI adaptation speed. Lütke is essentially running a real-time experiment in organizational transformation, with his company’s culture and talent retention as test subjects.

I suspect we’ll see more CEOs follow suit in the coming months - probably with softer language but similar intent. The line between “use AI to multiply your impact” and “justify your salary versus an AI subscription” is getting blurrier by the week. Shopify just decided to draw that line with a sharpie.

cheers,

PS: We just hit week four of the learn to code a little bit course and I've loved seeing what people are working on. Given the current levels of interest I'm expecting the next intake to be June. Though I should have some time to spend on promotion in a few weeks so it may be a bit sooner.

Code With JV

Meta's benchmark drama, OpenAI's $40B cash infusion and Shopify's AI ultimatum

Meta’s Llama 4 brings multimodal power and benchmark drama - announcement

OpenAI raised a bunch of money - news article

Shopify CEO’s “AI or bust” memo sparks industry debate - memo

Big AI power shifts: search, copyright, partnerships

Figma Make (take two), AI integrations hide power struggle, ElevenLabs nailing Vibe Coding DevRel

Manipulative AI, more cyber security risks and the best open model to date

Code With JV

Meta's benchmark drama, OpenAI's $40B cash infusion and Shopify's AI ultimatum

Meta’s Llama 4 brings multimodal power and benchmark drama - announcement​

OpenAI raised a bunch of money - news article​

Shopify CEO’s “AI or bust” memo sparks industry debate - memo​

Code With JV

Big AI power shifts: search, copyright, partnerships

Figma Make (take two), AI integrations hide power struggle, ElevenLabs nailing Vibe Coding DevRel

Manipulative AI, more cyber security risks and the best open model to date

Meta’s Llama 4 brings multimodal power and benchmark drama - announcement

OpenAI raised a bunch of money - news article

Shopify CEO’s “AI or bust” memo sparks industry debate - memo