Gemini 2.0 Flash images, OpenAI agent toolkits, Gemini code assist


Hi Reader,

Here are three things I found interesting in the world of AI in the last week

Gemini 2.0 Flash now generates images natively - no more Imagen - blog post

Google just dropped a significant update to Gemini 2.0 Flash, bringing native image generation directly into the model. Previously, Google relied on a separate system (Imagen) to handle image generation the same way ChatGPT relies on Dall-E3. But now Gemini can create (and edit) images on its own without any additional models, which is a pretty big milestone for multi-modal models.

But being Google and forever famous for search results about putting glue on pizza and eating rocks, it looks like they've under cooked the guardrails a little. They're obviously trying and shipped ShieldGemma 2 as part of their Gemma 3 launch which is specifically focused on stopping people from making dodgy images. Which seems to be working fine but according to TechCrunch, users have discovered that Flash's image editing capabilities are extremely effective at removing watermarks from stock photos and copyrighted images. Which is very much against the DMCA and lawsuit worthy.

Within days of release, social media is filling with examples of perfectly cleaned images that previously had prominent watermarks from Getty, Shutterstock, and other providers. Google claims they've implemented safeguards against copyright infringement, but this early evidence suggests these guardrails aren't working as intended.

Which reminds me of one of my favourite programming jokes.

A programmer is about to open a bar and invites their testing friends to come and make sure it works fine. They order one drink, infinity drinks, 1.234532432 drinks, 1/3 of a drink, -1 drinks, Π drinks, 四 drinks. They present a voucher for -$10 drinks and try to order a 100 of them. Everything works fine so the programmer opens the bar. The first customer comes in and asks to use the rest room and the bar immediately catches fire and burns to the ground.

It also reminds me of how capable these AI models are and how much work the companies have to do to restrict those capabilities. Imagine being an early tester of GPT4 - wow it can write code and poetry and summarise emails, oh it can also help you make a bomb and gives great advice about how to dispose of a body undetected.

But gemini flash is pretty good at making and editing images - text kind of works, you can have consistent characters across multiple images. You can play with it in the Google AI Studio.

OpenAI launches complete agent-building platform - announcement

OpenAI just unveiled their new agent-building APIs and on first glance, it looks... fine. You can now access the tools that power Operator and Deep Research through the api and build your own versions. The centerpiece is their new Responses API, which essentially marries the simplicity of Chat Completions with the tool-use capabilities from Assistants, which they have announced they will sunset by mid-2026.

Chat Completions was the original OpenAI api that became so popular a whole lot of libraries and providers replicated it and promoted themselves as OpenAI compatible to make it very easy for developers to switch over to them.

The Assistants API was from when they released their GPTs and wanted to give people access the building blocks which drove them and no one cared. It was more expensive than rolling your own RAG system and gave you a lot less control. I think it is ok to use for prototyping or if you don't want to learn how to build RAG from scratch, but it's not something I would recommend you rely on long term.

The Agents API feels more like Assistants than Chat Completions to me. Useful if you want to prototype something very quickly, or you don't know how to setup an MCP client that can access tools like web search or computer use. Strategically OpenAI desperately don't want to be a commodity AI provider, who compete by having the best price / capability numbers and would love people to deeply integrate with their bespoke, unique to them APIs.

I suggest you spend your time learning about MCP servers instead.

Gemini Code Assist brings intelligent code reviews to GitHub - docs

Google is making a decent move in the developer tools space with new GitHub integration features for Gemini Code Assist. The standout capability is automated code reviews for GitHub pull requests, which lets you install Gemini as a GitHub app that automatically analyzes pull requests and provides comprehensive review comments. I've been testing it on a few projects, and have found it useful. They say you get 5 free reviews per day on the free plan but it didn't start refusing until I'd done about 20.

I had a bunch of fun getting Claude Code to finish a ticket and create a pull request, Gemini would automatically review it and then I'd get Claude to read and respond to the review. There was a bit of plumbing to get dialled in because of boring stuff about nested comments and reviews in github pull requests, but the whole pipeline was a little more autonomous and go further before I had to get involved.

I think we are quite close to having workflows where AI agents spec out tickets, pull them down autonomously and complete them, multiple agents review and approve, another agent tests the staging site with human in the loop intercentions scattered throughout. I find there are still a lot of "I'm finished", "no your not, everything is broken" steps with the agentic coders but it feels more like ironing out the details as opposed to a big strategic rethink to get to a much more automated process.

Google have got a whole bunch of other features as part of Code Assist aimed at competing with Github Copilot, Cursor etc. But while Gemini 2 is a pretty strong model and Google offer some very sharp prices on it, it's no Claude 3.7 Sonnet so I haven't experimented with using it as a main coding model.

I rated the ease of setup and the value I got from having automatic PR reviews though, and have added it to my list of coding extensions to give a thorough review.

Cheers, JV

P.S. We've got some space opening up in April/May for consulting work, so get in touch if there's something the team and I can help you with!

Code With JV

Each week I share the three most interesting things I found in AI

Read more from Code With JV

Hi Reader, Here are three things I found interesting in the world of AI in the last week: Google feels the first crack in search dominance as Safari users drift to AI - testimony In a courtroom bombshell that sent $150 billion of Google’s market value up in smoke, Apple’s Eddy Cue casually mentioned that for the first time in over 20 years, Safari search volume has declined. Tech analysts immediately went into overdrive, frantically updating their valuation models. The 7.3% stock nosedive...

Hi Reader, Here are three things I found interesting in the world of AI in the last week: Figma Make brings AI “vibe-coding” to design workflows - official announcement Figma just launched "Make" - their AI-powered prototype generation tool that aims to deliver on the promise of converting designs and ideas into functional code. It lets designers transform their work into interactive prototypes via text prompts or convert existing Figma designs directly into working code. This is a meaningful...

Hi Reader, Here are three things I found interesting in the world of AI in the last week: LLMs secretly manipulated Reddit users’ opinions in an unauthorized experiment - article, follow-up Researchers from the University of Zurich just got caught running a massive unauthorized AI experiment on r/changemyview, where they unleashed AI bots that posted 1,783 comments over four months without anyone’s consent. The bots were programmed to be maximally persuasive by adopting fabricated identities...