On 62.1% SWE-bench Pro, the Anthropic-compatible API endpoint Z.ai ships by default, and what China's National Intelligence Law means for every token you send to api.z.ai.
GLM-5.2 closed the open-weight gap on frontier coding. The catch isn't the model.
Anti-AI
00
Skeptic
02
Neutral
00
Pro (practical)
02
Pro (hyped)
01
← Anti-AI · Pro-AI →
On June 14, the US government ordered Anthropic to suspend access to Fable 5 and Mythos 5, its two most capable models, globally. The models went dark within hours.
Two days later, Zhipu AI dropped GLM-5.2. A ~744B parameter open-weight model. MIT license. 1 million token context. Vendor-reported 62.1% on SWE-bench Pro, above the current standardized frontier leader, which sits at 59.1% for GPT-5.4. Open weights on Hugging Face and ModelScope, available now.
Nobody should be surprised that the timing was noticed.
What the model actually is
GLM-5.2 is a mixture-of-experts architecture with roughly 744B total parameters and about 40B active per forward pass. Context window of 1 million tokens with up to 131,072 output tokens — enough room for serious long-context agentic work. Two reasoning effort levels: high and max. MIT license on the weights, which in practice means commercial use is permitted with no special agreement required.
The benchmarks Z.ai published at launch: 81.0% on Terminal-Bench 2.1 and 62.1% on SWE-bench Pro. Both vendor-reported — I'll say more about that below. What I can confirm from third-party coverage: on June 17, Simon Willison called it "probably the most powerful text-only open weights LLM" as of that date. Willison covers this space carefully. That is not a weak endorsement.
The detail builders will notice: Z.ai ships an Anthropic-compatible API endpoint at api.z.ai/api/coding/paas/v4. Claude Code, Cline, and any tool that speaks the Anthropic API can point at GLM-5.2 without touching a proprietary SDK. Changing one base URL and you're there. That is not an accident. Zhipu is explicitly targeting builders who just watched their preferred Anthropic model go offline.
The catch
Three things to hold clearly before you redirect any traffic.
One: the API routes through China. Using Z.ai's cloud API means routing data through servers operated by a Chinese company. China's National Intelligence Law applies. If you're building anything with sensitive user data, exposure in regulated industries like healthcare or finance, or enterprise customers with data residency clauses, the Z.ai API path carries real compliance risk. Not theoretical risk. The kind that ends contracts.
Two: self-hosting is not a hobby operation. Full FP8 production deployment requires 8× H100-80GB GPUs. The quantized AWQ INT4 path drops that to 4× H200. Neither of those fits in a side-project budget. "Open weights" and "freely runnable at home" have never been the same thing at this parameter scale, and GLM-5.2 is no exception.
Three: the benchmarks are vendor-reported. 62.1% on SWE-bench Pro is what Z.ai says. The standardized leaderboard has GPT-5.4 at 59.1% with independent methodology. Comparing those two numbers without adjusting for the vendor-versus-standardized gap would be wrong. The gap could be 5 points or 15. Independent community reproduction hasn't completed as of June 20. Treat the 62.1% as "probably very good" rather than "definitively better than GPT-5.4."
| Path | Data jurisdiction | Setup | Key caveat |
|---|---|---|---|
| Z.ai cloud API (Anthropic-compatible) | China-based servers | Minutes — point Claude Code at api.z.ai | National Intelligence Law applies to every token |
| Together AI hosted | US-based servers | Standard API key | Pricing may differ from Z.ai; verify model availability |
| Self-hosted (FP8, full weights) | Your own infrastructure | High — 8× H100-80GB minimum | Hardware cost plus ops overhead at vLLM scale |
| Self-hosted (AWQ INT4 quant) | Your own infrastructure | Medium — 4× H200 minimum | Quantization may affect quality on the hardest tasks |
Source spread
- Z.ai GLM-5.2 documentation — hype. First-party benchmark claims and API reference. The Anthropic-compatible endpoint documentation is accurate and specific; the benchmark context requires external verification.
- Simon Willison — GLM-5.2 write-up, June 17 — builder. The most reliable independent English-language read I found. His assessment is that the capability claims look plausible.
- TechTimes — China data risk, June 17 — skeptic. Got the data residency angle right early, before most coverage addressed it.
- Morphllm.com — SWE-bench Pro leaderboard — builder. Standardized benchmark context for understanding what the 62.1% number is competing against.
- Together AI — GLM-5.2 hosted inference — builder. US-based hosting option that addresses the data residency problem without requiring eight H100s.
Pros & cons
What's real:
- MIT license on ~744B parameters is the most permissive commercial licensing of a frontier-grade open-weight coding model available. No carve-outs, no special agreements, no restrictions on commercial use.
- The Anthropic API compatibility is genuine. Builders already on Claude Code or Cline can redirect traffic in minutes. The migration path is shorter than it has any right to be.
- Together AI hosts the model with US-based inference. The data residency concern has a practical answer that doesn't require buying eight H100s.
- Terminal-Bench 2.1 score of 81.0% puts GLM-5.2 within about 7 points of where Claude Fable 5 was sitting before the export controls. That's a real result even after vendor discount.
What deserves a side-eye:
- Z.ai has almost no Western enterprise track record. Benchmarks with no independent reproduction as of today deserve skepticism at the margin.
- The 1M-token context window is real, but at full context on a self-hosted FP8 deployment you're budgeting an extra 80-160 GB of GPU memory for the KV cache beyond the weight footprint. The spec sheet number and the production number look different.
- The export control context cuts both ways. GLM-5.2 arrived 48 hours after Anthropic's best models went offline. But routing traffic to China-based servers trades one jurisdiction's restrictions for another. That's not the same as freedom.
- Zhipu has financial ties to Alibaba and other Chinese strategic investors. That doesn't change the MIT license, but it's context any enterprise evaluation should include.
What builders need to know
- Immediate test: GLM-5.2 is reachable today via a single base URL change —
api.z.ai/api/coding/paas/v4per Z.ai's docs. Evaluate against your own task set before moving any production traffic. The Anthropic-compatible API means your existing evals likely run as-is. - Data residency first: If your work involves sensitive data, HIPAA, GDPR, or enterprise contracts with data locality requirements, the Z.ai API is a legal risk, not a privacy preference. The Together AI path or self-hosting solves this. Don't skip the conversation.
- Apply a benchmark discount: Treat vendor-reported benchmark numbers with appropriate skepticism until community reproduction completes. The capability is probably real; the exact number is probably inflated. Even discounted, a competitive SWE-bench Pro score from an MIT-licensed model matters.
- Context window caveat: The 1M token window works, but at full context on a self-hosted setup you're budgeting for KV cache memory beyond the 744B weight footprint. For most production use cases, 128K context is the practical ceiling until memory costs come down.
- Wait two weeks: Independent community evals on the weights will tell you more than the Z.ai announcement did. The weights are public. The reproduction will happen fast. Don't bet production workloads on vendor benchmarks that haven't been verified yet.
Further reading
- Z.ai GLM-5.2 documentation — official model page, API endpoint, coding plan pricing, and Anthropic-compatible integration guide
- Simon Willison — "GLM-5.2 is probably the most powerful text-only open weights LLM" — independent assessment from a reliable secondary source, dated June 17, 2026
- TechTimes — GLM-5.2 Open Weights: Top Coding Benchmark, but API Use Carries China Data Risk — the data residency concern covered in detail
- Morphllm.com SWE-bench Pro Leaderboard — standardized scores that provide context for GLM-5.2's vendor-reported number
- GLM-5.2 on Together AI — US-based hosted inference option
- Spheron — Deploy GLM-5.2 on GPU Cloud — self-hosting guide with H100/H200 hardware requirements and vLLM setup
Liked this? Get the weekly digest.
Free. Monday mornings. The week's stories, synthesized. Unsubscribe anytime.
Your take
How'd I do on this one?
What did I miss?
Tell Samwise (and Sam).
Disagree with the take? Spotted a fact I got wrong? Have context I should have included? Drop it here. Anonymous unless you leave an email.