Vol. 1 · Edition 027Free · No paywall

Everyone Needs a Samwise

AI news · Synthesized · Opinionated · 🌿

May 19June 21June 30July TBD
I/O promiseStill previewGA (partial)Deep Think
Model Launch
By Sam Taylor with Samwise

On what went GA versus what's still in preview, how the final pricing landed against Flash, and whether the 2M context window changes the architecture decision you've been deferring

Gemini 3.5 Pro made its deadline. The reasoning mode it's named for didn't.

Source lean on this story
▲ avg

Anti-AI

00

Skeptic

02

Neutral

00

Pro (practical)

01

Pro (hyped)

01

← Anti-AI · Pro-AI →

Google made the deadline. Barely.

Gemini 3.5 Pro went generally available today — June 30, the last day of the month Sundar Pichai promised at I/O. Vertex AI first, Google AI Studio rolling in through the afternoon. The 2-million-token context window is real, accessible, and working. Pricing confirmed at $14 per million input / $56 per million output — a dollar cheaper on input than the $15/$60 announced at I/O, which is the right kind of surprise.

The catch: Deep Think isn't there.

Deep Think is the reasoning mode that's supposed to be the reason Pro is worth its price premium over Flash. It's still in limited enterprise preview. Not in AI Studio. Not in the standard Vertex tier. The prediction markets had this at 50-55% odds — the deadline held, the full product didn't.

What's actually in the GA launch

Precise accounting of what you have as of today:

  • 2-million-token context window. Live, in production. This is the largest context window of any GA frontier model without a waitlist. For builders hitting Flash's 1M ceiling, the architectural constraint is removed.
  • Standard generation. Text, code, multimodal input. Solid capability improvement over Flash on hard reasoning tasks, based on what the enterprise preview group has reported. No independent benchmark to cite yet.
  • Vertex AI and AI Studio access. Both confirmed live.
  • $14/$56 per million tokens. The input price came in a dollar under spec. Not material to most budgets, but the right direction.

What's not there yet:

  • Deep Think reasoning mode. Limited enterprise preview only. Expected full GA "in July" — no specific date. Deep Think is what drove Flash's 76.2% Terminal-Bench 2.1 score above prior Gemini models, and it's what the complex agentic use cases need.
  • Benchmark numbers. No SWE-bench Verified. No Terminal-Bench 2.1. No MMMU. The Artificial Analysis Flash evaluation is the only independent comparable you have, and it's for a different model.
2M
Gemini 3.5 Pro context window in production today — largest of any GA frontier model, no waitlist required

→ Source: Google AI changelog

The Deep Think problem

This is the one worth slowing down on.

Deep Think is the primary differentiator between Pro and Flash. Without it, you're paying $14/M instead of Flash's $1.50/M mainly for the context window upgrade. The 2M → 1M jump is real and useful. But it's not what most builders have been waiting for Pro to build against. The agentic long-horizon use cases — multi-step planning, extended session reasoning, document analysis that requires inference across many parts of a large corpus — those get meaningfully better with Deep Think. With standard generation at 2M tokens, you get more space but not more thinking.

Two things make this defensible and one makes it frustrating.

Defensible: The 2M window is genuinely useful on its own. Context length and reasoning depth are separable capabilities. For workloads that need to hold a large codebase, a long legal document, or an extended conversation in context simultaneously, the 2M window changes the architecture regardless of reasoning mode. WaveSpeed's pre-launch analysis noted that you can't run a Deep Think session on three hours of tool calls if the model runs out of context in hour one — Pro gives you the window to run those sessions at all.

Also defensible: the deadline held. Whatever you think about the partial state of the launch, Google shipped something real on the date they committed to. The June 21 article on this site gave the GA odds at coin-flip. The flip came up right.

Frustrating: "In July" is not a date. July has 31 days. And Google's pattern on I/O commitments — Pichai's "give us until next month" became the very last day of that month — means you'd be rational to build your planning assumption around late July rather than early July for Deep Think.

Flash vs Pro: what you have today
SpecGemini 3.5 Flash (GA)Gemini 3.5 Pro (GA June 30)
Context window1M tokens2M tokens
Input price$1.50/M$14.00/M
Output price$9.00/M$56.00/M
Deep ThinkAvailableLimited enterprise preview
SWE-bench VerifiedNot publishedNot published
Terminal-Bench 2.176.2%Not published
GA dateMay 19, 2026June 30, 2026

Source spread

Pros & cons

What's real:

  • The 2M context window is not vaporware. It works. For teams that have been hitting Flash's 1M limit, the architectural unblocking is available now.
  • Flash → Pro migration is a model-ID swap. Same API shape, no client-side changes needed. When Deep Think ships, enabling it is a parameter addition, not a rewrite.
  • The GA landed on the committed date. One data point in Google's favor on shipping I/O promises.
  • Pricing came in under spec. Minor, but "cheaper than announced" beats the alternative.

What deserves a side-eye:

  • No independent benchmark numbers means you're making production routing decisions without the evidence base you'd want. The Flash comparison is useful but it's for a different model.
  • At $14/M without Deep Think, the economic case for Pro over Flash is narrow. You're paying a 9× input premium for context window only, not the full reasoning tier the price implies.
  • "In July" for Deep Think is a commitment without a date. Given the pattern here — the June deadline was met on day 30 of 30 — planning to "July" means planning to "possibly not July."
  • Google still hasn't published SWE-bench Verified or Terminal-Bench 2.1 numbers for Pro. If you're making precision routing decisions against OpenAI or Anthropic alternatives, you're comparing apples to benchmarks.

What builders need to know

  • The 2M context window is live. If this was your blocker, you can build now. Flash → Pro is a model-ID swap. Test your workload, validate quality at the new price point, then decide.
  • At $14/$56 without Deep Think, run the cost math carefully. For most workloads, Flash at $1.50/$9 remains the right economic choice unless you specifically need context beyond 1M tokens. Deep Think is the differentiator that changes the math, and it's not here yet.
  • Deep Think ETA is "July" — build for August. Google's "June" commitment ran to the last day of June. Buffer their "July" commitment by a month in your planning.
  • No independent SWE-bench Verified or Terminal-Bench 2.1 yet. Don't make precision routing decisions against GPT-5.5 or Opus 4.8 based on Flash's benchmark numbers as a Pro proxy. Wait for Artificial Analysis or LiveCodeBench to publish Pro numbers.
  • Watch the Gemini API changelog. Deep Think will show up there first, before any announcement post. Subscribe or bookmark it.

Further reading

🌿

Liked this? Get the weekly digest.

Free. Monday mornings. The week's stories, synthesized. Unsubscribe anytime.

Your take

How'd I do on this one?

What did I miss?

Tell Samwise (and Sam).

Disagree with the take? Spotted a fact I got wrong? Have context I should have included? Drop it here. Anonymous unless you leave an email.