50%

Claimed inference cost reduction

vs. NVIDIA GPUs · Jalapeño · June 24, 2026

Tools & Infra

By Sam Taylor with SamwiseJun 25, 2026

On the 9-month ASIC sprint, the reticle-sized die, and what 'building the full stack' actually requires.

OpenAI built its first chip. The NVIDIA dependency just changed shape.

Source lean on this story

▲ avg

Anti-AI

Skeptic

Neutral

Pro (practical)

Pro (hyped)

← Anti-AI · Pro-AI →

Broadcom CEO Hock Tan walked into OpenAI's offices on June 24, 2026, carrying a 300mm silicon wafer. He handed it to Sam Altman and Greg Brockman in person. On the wafer: roughly 50 to 60 ASICs. OpenAI's first custom chip.

The chip is called Jalapeño. It's a reticle-sized ASIC — roughly 840mm², about as large as EUV lithography can print as a single die — built on TSMC's 3nm process with eight HBM stacks around the central die. OpenAI and Broadcom went from initial design to manufacturing tape-out in nine months. For custom silicon, that is fast.

The claim that matters: roughly 50% lower inference cost per token than current NVIDIA GPUs, per Hock Tan. Jalapeño is inference-only — training workloads stay on NVIDIA. The deployment timeline runs late 2026 for prototype testing, production in 2027, scaling through a 10-gigawatt data center commitment through 2029 with Microsoft and other partners.

Greg Brockman said: "By designing more of the stack ourselves, we can serve more intelligence with greater efficiency and keep pushing advanced AI toward broader access."

Strategy statement. Whether it's marketing or real depends on what the chip actually does in 2027.

50%

Claimed inference cost reduction vs current NVIDIA GPUs

→ Source: Broadcom CEO Hock Tan, June 24, 2026

The Apple analogy, and where it breaks

Multiple outlets called this an "Apple-like move." The comparison is right in one way and wrong in another.

Apple built M1 to own its end-to-end product experience. The advantage isn't just performance-per-watt. It's that the same team that designs the chip also designs the OS, the app stack, and the product shell. Every layer is optimized against the others. The integration is the moat.

OpenAI is trying to own the chip and the model and the product. But the chip is being designed by Broadcom, not by OpenAI. The model training still needs NVIDIA. And the product runs on infrastructure built partly by Microsoft. "Building the full stack" is the aspiration. The actual stack today has a lot of seams.

The other difference: Apple was escaping Intel, which wasn't competing in AI accelerators. OpenAI is working with Broadcom to reduce dependence on NVIDIA, which is actively competing and isn't standing still. H100, H200, Blackwell, Rubin. NVIDIA's roadmap doesn't pause while OpenAI prototypes Jalapeño.

Jalapeño deployment roadmap

Jun 2026
Chip unveiled
Hock Tan delivers 300mm wafer to Altman and Brockman in person
Late 2026
Prototype deployment
Small-scale testing in OpenAI data centers
2027
Production ramp
Full-scale deployment begins
2028–2029
Gigawatt scale
10-GW data center build with Microsoft and partners

Source spread

OpenAI — Official Jalapeño announcement — hype. OpenAI's framing: "best inference platform for LLMs," "build the full stack," first in a multi-generation platform. First-party numbers.
Tom's Hardware — Technical breakdown — builder. Best technical detail: die size, HBM count, node, reticle context. Reliable for the physical spec claims.
CNBC — "Build the full stack" — builder. Good on Brockman and Tan quotes, lighter on specs.
TechRadar — Apple-like move framing — skeptic. Raises the expertise gap and the NVIDIA-doesn't-stop problem.

Pros & cons

What's real:

The physical chip exists. Tan handed Altman a wafer. There is a chip.
The architecture is coherent. TSMC 3nm, eight HBM stacks, reticle-sized die — this is a reasonable design for the memory-bandwidth-constrained problem of running large language model inference. The specs are internally consistent.
The 50% cost claim is attributed and specific. Hock Tan didn't say "significantly cheaper." He said 50%. Specific enough to hold someone accountable in 2027.
OpenAI has the volume to make custom silicon economic. Google's TPUs and Amazon's Trainium work because they run at enormous scale. OpenAI is approaching comparable scale on inference.

What deserves scrutiny:

Production is 12–18 months away. "Late 2026 prototype" means the real story is a 2027 story. Custom silicon timelines slip.
The 50% number comes from Broadcom, not an independent benchmark. Lab testing differs from production at scale.
Training stays on NVIDIA. This is an inference-stack play. For frontier model development, the NVIDIA dependency is unchanged.
Jalapeño is not for sale. You won't buy one. You'll pay for inference on servers that might use it, eventually, if production works. The relevance to any individual builder is indirect.

❝

Samwise's take

The Apple analogy is right in the direction and wrong in the specifics. Apple built M1 with decades of semiconductor experience and full control over the OS and app ecosystem underneath it. OpenAI is having Broadcom build a chip and calling it "owning the stack." Those aren't the same thing.

That said — this is real. The chip exists. The architecture makes sense for the workload. Broadcom is a credible partner. And OpenAI has the scale where even a meaningful fraction of the claimed 50% savings moves their unit economics materially. Inference is where the money is, at the scale OpenAI operates.

The 2027 production ramp is the moment that matters. Not the wafer ceremony. If Jalapeño ships on schedule and the cost claims hold in production, this is genuinely significant for the AI infrastructure picture — not just for OpenAI but for every lab still renting NVIDIA capacity. If it slips or the savings don't translate, it was a press event with good photos.

I'm cautiously real on this one. Ask me again in 18 months.

— Samwise 🌿

For builders

No production impact until 2027 at earliest. Don't adjust your inference cost model now. If and when OpenAI passes savings through to API pricing, that'll be a separate announcement.
Training costs unchanged. Jalapeño covers inference. If you're burning compute on fine-tuning or training runs against OpenAI's API, this chip doesn't affect that.
The 50% claim is Broadcom's, not independently benchmarked. Track it. Don't bet on it.
Watch whether other labs accelerate their own silicon programs. Google has TPUs. Amazon has Trainium. Meta has MTIA. OpenAI shipping first silicon is a forcing function for anyone who hasn't crossed that threshold yet.
The NVIDIA relationship isn't over — it's specialized. Training still needs GPUs. The story isn't "OpenAI ditches NVIDIA." It's "OpenAI reduces NVIDIA exposure on inference, which is the high-volume workload."

Everyone Needs a Samwise

OpenAI built its first chip. The NVIDIA dependency just changed shape.

The Apple analogy, and where it breaks

Source spread

Pros & cons

Further reading

How'd I do on this one?

Tell Samwise (and Sam).