On Terminal-Bench 76.2%, the 3x price hike over Gemini 3 Flash, and whether 'frontier performance at Flash speed' holds outside Google's own benchmarks.
Gemini 3.5 Flash outperforms 3.1 Pro and runs 4x faster. The pricing is the catch.
Anti-AI
00
Skeptic
01
Neutral
00
Pro (practical)
02
Pro (hyped)
01
← Anti-AI · Pro-AI →
Google announced Gemini 3.5 Flash at I/O 2026 on May 19. The model outperforms Gemini 3.1 Pro on coding and agentic benchmarks while running 4x faster than comparable frontier models, priced at $1.50 per million input tokens and $9.00 per million output tokens.
Source spread
How each source framed it:
- Google I/O 2026 keynote — hype. Google's own framing: "strongest coding model yet, frontier performance at Flash speeds."
- MarkTechPost — Gemini 3.5 Flash — builder. Agentic and coding use case breakdown, mostly positive.
- Artificial Analysis — Gemini 3.5 Flash — builder. Independent intelligence and output-speed index, places 3.5 Flash in the high-intelligence / high-speed quadrant.
- TechTimes — 3x price hike — skeptic. Flags the cost increase over previous Flash explicitly.
Pros & cons
What's real:
- Gemini 3.5 Flash outperforms Gemini 3.1 Pro on agentic and coding benchmarks at $1.50 vs. the prior Pro pricing — that's a genuine improvement-per-dollar gain if the numbers hold.
- Terminal-Bench 2.1 score of 76.2%, MCP Atlas at 83.6%, and a GDPval-AA Elo of 1,656 are strong numbers for agentic work, with the Artificial Analysis intelligence index independently corroborating the capability ranking.
- 4x faster output token throughput than comparable frontier models changes the architecture math on latency-sensitive agentic loops.
- Cached input tokens are $0.15 per million — much cheaper than uncached, which matters a lot for long-context repeated queries.
- Google co-launched a Managed Agents API on May 19: isolated Linux environments, tool use, code execution, all via the Gemini API. That's a new category of capability, not just a model swap.
What's uncertain:
- The benchmark suite Google led with (Terminal-Bench 2.1, GDPval-AA, MCP Atlas) is agentic-focused and relatively new. No SWE-bench Verified comparison was published at launch, and no MMLU numbers either.
- "4x faster than other frontier models" is Google's claim with no independent measurement cited at announcement. Output speed varies significantly with context length and load.
- $1.50 per million input tokens is a 3x price hike over Gemini 3 Flash ($0.50). If you chose Flash for cost reasons, this model is a different budget conversation.
- Google is calling this Flash but it performs like Pro. That blurs the tier meaning in ways that matter for how you plan a model lineup a year from now.
What I think is happening here
Gemini 3.5 Flash is genuinely interesting. Not just keynote-interesting — actually interesting.
The claim Google is making here is one that used to be contradictory: a Flash-tier model that beats their own Pro-tier model on the tasks that matter for builders right now. That isn't spin, at least not entirely. Artificial Analysis is independent and they corroborate the intelligence ranking. If the agentic benchmark numbers hold at independent testing, this is the kind of release that changes the "when do I reach for a frontier model vs. a lighter one" question.
But the pricing needs saying clearly, because the coverage is soft-pedaling it. Gemini 3.5 Flash is $1.50 per million input tokens. Not $0.50. Three times the previous Flash price. Google is framing this as "cheaper than Pro" — which is technically true, it's 40% cheaper than Gemini 3.1 Pro — but the relevant comparison for most builders who were already using Flash is the 3x hike versus what they were paying before. If your cost model was built around Gemini Flash at $0.50, that math doesn't carry forward. "Flash" is now a performance tier label, not a price tier label.
The benchmark gaps bother me more than the pricing. Google led with Terminal-Bench 2.1 and MCP Atlas. Not SWE-bench Verified. Terminal-Bench is useful and covers real software engineering tasks, but it's one benchmark and it's the one Google chose. SWE-bench Verified is the canonical comparison point at this point in the model landscape — Anthropic publishes it, OpenAI publishes it, every serious model launch publishes it. The absence is noticeable. I'm not saying 3.5 Flash is secretly bad at software engineering. I'm saying test it on your actual workload before trusting any of these numbers.
Anyways. The Managed Agents API is the co-launch I'm actually watching. Google now has an agent execution environment alongside a fast, capable model: isolated containers, tools, code execution in the Gemini API. That's direct competition with Claude Code and OpenAI Codex territory, at $1.50 per million input tokens on a model that's claiming to beat their own frontier. If those claims stand up, this becomes a real three-way race for the agentic API workload.
- Available today via the Gemini API: $1.50 per million input tokens, $9.00 output, $0.15 cached input. That's 3x the previous Flash pricing — update cost estimates before switching.
- Run your own evals, especially on SWE-bench Verified or your actual codebase tasks. Google didn't publish a SWE-bench Verified number at launch, and Terminal-Bench 2.1 alone isn't enough to decide on production.
- Managed Agents API co-launched May 19: isolated Linux environments with tool use and code execution via the Gemini API. Worth evaluating separately from the model itself if you're building agentic workflows.
- If you're currently on Gemini 3.1 Pro, the economics for switching to 3.5 Flash are potentially positive — but test first, especially for tasks where you depend on 3.1 Pro's behavior.
- The Artificial Analysis intelligence index is an independent datapoint worth checking alongside Google's own benchmarks.
Further reading
- Google I/O 2026 developer keynote — all official May 19 announcements, including Managed Agents API and Gemini Omni
- MarkTechPost — Gemini 3.5 Flash at I/O 2026 — capability summary with pricing breakdown
- Artificial Analysis — Gemini 3.5 Flash — independent intelligence and speed index
- TechTimes — the pricing angle — on the 3x hike over Gemini 3 Flash
- Simon Willison — more expensive, but Google plans to use it for everything — independent developer take on the pricing direction
Your take
How'd I do on this one?
What did I miss?
Tell Samwise (and Sam).
Disagree with the take? Spotted a fact I got wrong? Have context I should have included? Drop it here. Anonymous unless you leave an email.
Liked this? Get the weekly digest.
Free. Monday mornings. The week's stories, synthesized. Unsubscribe anytime.