Gemini 2.0 Flash images, OpenAI agent toolkits, Gemini code assist

Hi Reader,

Here are three things I found interesting in the world of AI in the last week

Gemini 2.0 Flash now generates images natively - no more Imagen - blog post

Google just dropped a significant update to Gemini 2.0 Flash, bringing native image generation directly into the model. Previously, Google relied on a separate system (Imagen) to handle image generation the same way ChatGPT relies on Dall-E3. But now Gemini can create (and edit) images on its own without any additional models, which is a pretty big milestone for multi-modal models.

But being Google and forever famous for search results about putting glue on pizza and eating rocks, it looks like they've under cooked the guardrails a little. They're obviously trying and shipped ShieldGemma 2 as part of their Gemma 3 launch which is specifically focused on stopping people from making dodgy images. Which seems to be working fine but according to TechCrunch, users have discovered that Flash's image editing capabilities are extremely effective at removing watermarks from stock photos and copyrighted images. Which is very much against the DMCA and lawsuit worthy.

Within days of release, social media is filling with examples of perfectly cleaned images that previously had prominent watermarks from Getty, Shutterstock, and other providers. Google claims they've implemented safeguards against copyright infringement, but this early evidence suggests these guardrails aren't working as intended.

Which reminds me of one of my favourite programming jokes.

A programmer is about to open a bar and invites their testing friends to come and make sure it works fine. They order one drink, infinity drinks, 1.234532432 drinks, 1/3 of a drink, -1 drinks, Π drinks, 四 drinks. They present a voucher for -$10 drinks and try to order a 100 of them. Everything works fine so the programmer opens the bar. The first customer comes in and asks to use the rest room and the bar immediately catches fire and burns to the ground.

It also reminds me of how capable these AI models are and how much work the companies have to do to restrict those capabilities. Imagine being an early tester of GPT4 - wow it can write code and poetry and summarise emails, oh it can also help you make a bomb and gives great advice about how to dispose of a body undetected.

But gemini flash is pretty good at making and editing images - text kind of works, you can have consistent characters across multiple images. You can play with it in the Google AI Studio.

OpenAI launches complete agent-building platform - announcement

OpenAI just unveiled their new agent-building APIs and on first glance, it looks... fine. You can now access the tools that power Operator and Deep Research through the api and build your own versions. The centerpiece is their new Responses API, which essentially marries the simplicity of Chat Completions with the tool-use capabilities from Assistants, which they have announced they will sunset by mid-2026.

Chat Completions was the original OpenAI api that became so popular a whole lot of libraries and providers replicated it and promoted themselves as OpenAI compatible to make it very easy for developers to switch over to them.

The Assistants API was from when they released their GPTs and wanted to give people access the building blocks which drove them and no one cared. It was more expensive than rolling your own RAG system and gave you a lot less control. I think it is ok to use for prototyping or if you don't want to learn how to build RAG from scratch, but it's not something I would recommend you rely on long term.

The Agents API feels more like Assistants than Chat Completions to me. Useful if you want to prototype something very quickly, or you don't know how to setup an MCP client that can access tools like web search or computer use. Strategically OpenAI desperately don't want to be a commodity AI provider, who compete by having the best price / capability numbers and would love people to deeply integrate with their bespoke, unique to them APIs.

I suggest you spend your time learning about MCP servers instead.

Gemini Code Assist brings intelligent code reviews to GitHub - docs

Google is making a decent move in the developer tools space with new GitHub integration features for Gemini Code Assist. The standout capability is automated code reviews for GitHub pull requests, which lets you install Gemini as a GitHub app that automatically analyzes pull requests and provides comprehensive review comments. I've been testing it on a few projects, and have found it useful. They say you get 5 free reviews per day on the free plan but it didn't start refusing until I'd done about 20.

I had a bunch of fun getting Claude Code to finish a ticket and create a pull request, Gemini would automatically review it and then I'd get Claude to read and respond to the review. There was a bit of plumbing to get dialled in because of boring stuff about nested comments and reviews in github pull requests, but the whole pipeline was a little more autonomous and go further before I had to get involved.

I think we are quite close to having workflows where AI agents spec out tickets, pull them down autonomously and complete them, multiple agents review and approve, another agent tests the staging site with human in the loop intercentions scattered throughout. I find there are still a lot of "I'm finished", "no your not, everything is broken" steps with the agentic coders but it feels more like ironing out the details as opposed to a big strategic rethink to get to a much more automated process.

Google have got a whole bunch of other features as part of Code Assist aimed at competing with Github Copilot, Cursor etc. But while Gemini 2 is a pretty strong model and Google offer some very sharp prices on it, it's no Claude 3.7 Sonnet so I haven't experimented with using it as a main coding model.

I rated the ease of setup and the value I got from having automatic PR reviews though, and have added it to my list of coding extensions to give a thorough review.

Cheers, JV

P.S. We've got some space opening up in April/May for consulting work, so get in touch if there's something the team and I can help you with!

Code With JV

Gemini 2.0 Flash images, OpenAI agent toolkits, Gemini code assist

Big AI power shifts: search, copyright, partnerships

Figma Make (take two), AI integrations hide power struggle, ElevenLabs nailing Vibe Coding DevRel

Manipulative AI, more cyber security risks and the best open model to date