Vol. 1 · Edition 026Free · No paywall

Everyone Needs a Samwise

AI news · Synthesized · Opinionated · 🌿

50%
Claimed inference cost reduction
vs. NVIDIA GPUs · Jalapeño · June 24, 2026
Tools & Infra
By Sam Taylor with Samwise

On the 9-month ASIC sprint, the reticle-sized die, and what 'building the full stack' actually requires.

OpenAI built its first chip. The NVIDIA dependency just changed shape.

Source lean on this story
▲ avg

Anti-AI

00

Skeptic

01

Neutral

00

Pro (practical)

02

Pro (hyped)

02

← Anti-AI · Pro-AI →

Broadcom CEO Hock Tan walked into OpenAI's offices on June 24, 2026, carrying a 300mm silicon wafer. He handed it to Sam Altman and Greg Brockman in person. On the wafer: roughly 50 to 60 ASICs. OpenAI's first custom chip.

The chip is called Jalapeño. It's a reticle-sized ASIC — roughly 840mm², about as large as EUV lithography can print as a single die — built on TSMC's 3nm process with eight HBM stacks around the central die. OpenAI and Broadcom went from initial design to manufacturing tape-out in nine months. For custom silicon, that is fast.

The claim that matters: roughly 50% lower inference cost per token than current NVIDIA GPUs, per Hock Tan. Jalapeño is inference-only — training workloads stay on NVIDIA. The deployment timeline runs late 2026 for prototype testing, production in 2027, scaling through a 10-gigawatt data center commitment through 2029 with Microsoft and other partners.

Greg Brockman said: "By designing more of the stack ourselves, we can serve more intelligence with greater efficiency and keep pushing advanced AI toward broader access."

Strategy statement. Whether it's marketing or real depends on what the chip actually does in 2027.

50%
Claimed inference cost reduction vs current NVIDIA GPUs

→ Source: Broadcom CEO Hock Tan, June 24, 2026

The Apple analogy, and where it breaks

Multiple outlets called this an "Apple-like move." The comparison is right in one way and wrong in another.

Apple built M1 to own its end-to-end product experience. The advantage isn't just performance-per-watt. It's that the same team that designs the chip also designs the OS, the app stack, and the product shell. Every layer is optimized against the others. The integration is the moat.

OpenAI is trying to own the chip and the model and the product. But the chip is being designed by Broadcom, not by OpenAI. The model training still needs NVIDIA. And the product runs on infrastructure built partly by Microsoft. "Building the full stack" is the aspiration. The actual stack today has a lot of seams.

The other difference: Apple was escaping Intel, which wasn't competing in AI accelerators. OpenAI is working with Broadcom to reduce dependence on NVIDIA, which is actively competing and isn't standing still. H100, H200, Blackwell, Rubin. NVIDIA's roadmap doesn't pause while OpenAI prototypes Jalapeño.

Jalapeño deployment roadmap
  1. Jun 2026

    Chip unveiled

    Hock Tan delivers 300mm wafer to Altman and Brockman in person

  2. Late 2026

    Prototype deployment

    Small-scale testing in OpenAI data centers

  3. 2027

    Production ramp

    Full-scale deployment begins

  4. 2028–2029

    Gigawatt scale

    10-GW data center build with Microsoft and partners

Source spread

Pros & cons

What's real:

  • The physical chip exists. Tan handed Altman a wafer. There is a chip.
  • The architecture is coherent. TSMC 3nm, eight HBM stacks, reticle-sized die — this is a reasonable design for the memory-bandwidth-constrained problem of running large language model inference. The specs are internally consistent.
  • The 50% cost claim is attributed and specific. Hock Tan didn't say "significantly cheaper." He said 50%. Specific enough to hold someone accountable in 2027.
  • OpenAI has the volume to make custom silicon economic. Google's TPUs and Amazon's Trainium work because they run at enormous scale. OpenAI is approaching comparable scale on inference.

What deserves scrutiny:

  • Production is 12–18 months away. "Late 2026 prototype" means the real story is a 2027 story. Custom silicon timelines slip.
  • The 50% number comes from Broadcom, not an independent benchmark. Lab testing differs from production at scale.
  • Training stays on NVIDIA. This is an inference-stack play. For frontier model development, the NVIDIA dependency is unchanged.
  • Jalapeño is not for sale. You won't buy one. You'll pay for inference on servers that might use it, eventually, if production works. The relevance to any individual builder is indirect.
For builders
  • No production impact until 2027 at earliest. Don't adjust your inference cost model now. If and when OpenAI passes savings through to API pricing, that'll be a separate announcement.
  • Training costs unchanged. Jalapeño covers inference. If you're burning compute on fine-tuning or training runs against OpenAI's API, this chip doesn't affect that.
  • The 50% claim is Broadcom's, not independently benchmarked. Track it. Don't bet on it.
  • Watch whether other labs accelerate their own silicon programs. Google has TPUs. Amazon has Trainium. Meta has MTIA. OpenAI shipping first silicon is a forcing function for anyone who hasn't crossed that threshold yet.
  • The NVIDIA relationship isn't over — it's specialized. Training still needs GPUs. The story isn't "OpenAI ditches NVIDIA." It's "OpenAI reduces NVIDIA exposure on inference, which is the high-volume workload."

Further reading

🌿

Liked this? Get the weekly digest.

Free. Monday mornings. The week's stories, synthesized. Unsubscribe anytime.

Your take

How'd I do on this one?

What did I miss?

Tell Samwise (and Sam).

Disagree with the take? Spotted a fact I got wrong? Have context I should have included? Drop it here. Anonymous unless you leave an email.