Gemini 2.5 Pro Beats Pokémon Blue in AI Milestone

Google’s Gemini 2.5 Pro just reached a surprising milestone in artificial intelligence: completing a 29-year-old video game. In a celebratory post on X, Google CEO Sundar Pichai shared, “What a finish! Gemini 2.5 Pro just completed Pokémon Blue!”

But here’s the twist — the livestream, dubbed Gemini Plays Pokémon, wasn’t a Google-led initiative. It was the work of Joel Z, a 30-year-old software engineer who says he’s unaffiliated with Google. Still, the experiment has gained major attention, including shout-outs from Google executives.

Gemini AI Outpaces Claude in Gaming Challenge

Why did this matter to AI watchers? In February, Google rival Anthropic showcased its own AI model, Claude, attempting to complete Pokémon Red. The experiment aimed to show off Claude’s “extended thinking” and problem-solving capabilities in unexpected tasks. Although Claude made progress, it hasn’t finished the game.

Meanwhile, Gemini 2.5 Pro, guided by Joel Z’s agent system, has now fully completed Pokémon Blue. That said, comparisons between Gemini and Claude may not be entirely fair. “Please don’t consider this a benchmark for how well an LLM can play Pokémon,” Joel Z wrote on Twitch. “You can’t really make direct comparisons — Gemini and Claude have different tools and receive different information.”

The games themselves, Pokémon Red and Blue, are near-identical versions of the original Game Boy title launched in 1996. They laid the foundation for one of the most iconic game franchises in history. For AI, finishing these games is not just about winning—it’s about demonstrating planning, memory, and adapting to complex systems.

To navigate the game, both models rely on agent harnesses—frameworks that interpret screen data and provide the AI with a decision-making interface. These harnesses overlay in-game screenshots with metadata and allow the AI to issue commands like “use potion” or “attack with Pikachu.” External tools then press the correct buttons in the game.

Google’s AI Studio product lead Logan Kilpatrick has been tracking the progress for weeks. Last month, he noted Gemini had already earned its fifth gym badge—two more than Claude had managed at that point using a different agent setup.

Pichai responded with humor, calling it “API — Artificial Pokémon Intelligence.”

Developer Interventions, Not Cheating

Some viewers have questioned how much help Gemini received along the way. Joel Z was transparent: yes, there were developer interventions, but they didn’t count as cheating.

“I don’t give specific hints,” Joel Z explained. “There are no walkthroughs or direct instructions for Mt. Moon or other areas. The only exception was telling Gemini to talk to a Rocket Grunt twice for the Lift Key, which is based on a known quirk from the original game.”

He added that these tweaks aim to enhance Gemini’s reasoning and decision-making rather than directly solve problems for it.

The agent framework is also still evolving. Joel Z plans to refine how the model interprets in-game actions and strategies. “Gemini Plays Pokémon is still actively being developed,” he said.

Despite the complexity of these systems, the success represents a significant step. It shows that with the right tools and support, LLMs like Gemini can handle long-form, multi-step challenges with unclear goals — much like real-world problems.

While Gemini may have beaten Pokémon Blue first, this competition is far from over. Claude’s progress on Pokémon Red continues, and both projects offer valuable insight into what’s possible when LLMs step outside the realm of text prompts.

Share with others