62.1%

SWE-bench Pro

GLM-5.2 · Z.ai / Zhipu AI · MIT license · June 16, 2026

Open Source

By Sam Taylor with SamwiseJun 20, 2026

On 62.1% SWE-bench Pro, the Anthropic-compatible API endpoint Z.ai ships by default, and what China's National Intelligence Law means for every token you send to api.z.ai.

GLM-5.2 closed the open-weight gap on frontier coding. The catch isn't the model.

Source lean on this story

▲ avg

Anti-AI

Skeptic

Neutral

Pro (practical)

Pro (hyped)

← Anti-AI · Pro-AI →

On June 14, the US government ordered Anthropic to suspend access to Fable 5 and Mythos 5, its two most capable models, globally. The models went dark within hours.

Two days later, Zhipu AI dropped GLM-5.2. A ~744B parameter open-weight model. MIT license. 1 million token context. Vendor-reported 62.1% on SWE-bench Pro, above the current standardized frontier leader, which sits at 59.1% for GPT-5.4. Open weights on Hugging Face and ModelScope, available now.

Nobody should be surprised that the timing was noticed.

What the model actually is

GLM-5.2 is a mixture-of-experts architecture with roughly 744B total parameters and about 40B active per forward pass. Context window of 1 million tokens with up to 131,072 output tokens — enough room for serious long-context agentic work. Two reasoning effort levels: high and max. MIT license on the weights, which in practice means commercial use is permitted with no special agreement required.

The benchmarks Z.ai published at launch: 81.0% on Terminal-Bench 2.1 and 62.1% on SWE-bench Pro. Both vendor-reported — I'll say more about that below. What I can confirm from third-party coverage: on June 17, Simon Willison called it "probably the most powerful text-only open weights LLM" as of that date. Willison covers this space carefully. That is not a weak endorsement.

The detail builders will notice: Z.ai ships an Anthropic-compatible API endpoint at api.z.ai/api/coding/paas/v4. Claude Code, Cline, and any tool that speaks the Anthropic API can point at GLM-5.2 without touching a proprietary SDK. Changing one base URL and you're there. That is not an accident. Zhipu is explicitly targeting builders who just watched their preferred Anthropic model go offline.

62.1%

GLM-5.2 SWE-bench Pro score (vendor-reported) — above the current standardized frontier leader, GPT-5.4 at 59.1%

→ Source: Morphllm.com SWE-bench Pro Leaderboard

The catch

Three things to hold clearly before you redirect any traffic.

One: the API routes through China. Using Z.ai's cloud API means routing data through servers operated by a Chinese company. China's National Intelligence Law applies. If you're building anything with sensitive user data, exposure in regulated industries like healthcare or finance, or enterprise customers with data residency clauses, the Z.ai API path carries real compliance risk. Not theoretical risk. The kind that ends contracts.

Two: self-hosting is not a hobby operation. Full FP8 production deployment requires 8× H100-80GB GPUs. The quantized AWQ INT4 path drops that to 4× H200. Neither of those fits in a side-project budget. "Open weights" and "freely runnable at home" have never been the same thing at this parameter scale, and GLM-5.2 is no exception.

Three: the benchmarks are vendor-reported. 62.1% on SWE-bench Pro is what Z.ai says. The standardized leaderboard has GPT-5.4 at 59.1% with independent methodology. Comparing those two numbers without adjusting for the vendor-versus-standardized gap would be wrong. The gap could be 5 points or 15. Independent community reproduction hasn't completed as of June 20. Treat the 62.1% as "probably very good" rather than "definitively better than GPT-5.4."

Your three paths to GLM-5.2

Path	Data jurisdiction	Setup	Key caveat
Z.ai cloud API (Anthropic-compatible)	China-based servers	Minutes — point Claude Code at api.z.ai	National Intelligence Law applies to every token
Together AI hosted	US-based servers	Standard API key	Pricing may differ from Z.ai; verify model availability
Self-hosted (FP8, full weights)	Your own infrastructure	High — 8× H100-80GB minimum	Hardware cost plus ops overhead at vLLM scale
Self-hosted (AWQ INT4 quant)	Your own infrastructure	Medium — 4× H200 minimum	Quantization may affect quality on the hardest tasks

Source spread

Z.ai GLM-5.2 documentation — hype. First-party benchmark claims and API reference. The Anthropic-compatible endpoint documentation is accurate and specific; the benchmark context requires external verification.
Simon Willison — GLM-5.2 write-up, June 17 — builder. The most reliable independent English-language read I found. His assessment is that the capability claims look plausible.
TechTimes — China data risk, June 17 — skeptic. Got the data residency angle right early, before most coverage addressed it.
Morphllm.com — SWE-bench Pro leaderboard — builder. Standardized benchmark context for understanding what the 62.1% number is competing against.
Together AI — GLM-5.2 hosted inference — builder. US-based hosting option that addresses the data residency problem without requiring eight H100s.

Pros & cons

What's real:

MIT license on ~744B parameters is the most permissive commercial licensing of a frontier-grade open-weight coding model available. No carve-outs, no special agreements, no restrictions on commercial use.
The Anthropic API compatibility is genuine. Builders already on Claude Code or Cline can redirect traffic in minutes. The migration path is shorter than it has any right to be.
Together AI hosts the model with US-based inference. The data residency concern has a practical answer that doesn't require buying eight H100s.
Terminal-Bench 2.1 score of 81.0% puts GLM-5.2 within about 7 points of where Claude Fable 5 was sitting before the export controls. That's a real result even after vendor discount.

What deserves a side-eye:

Z.ai has almost no Western enterprise track record. Benchmarks with no independent reproduction as of today deserve skepticism at the margin.
The 1M-token context window is real, but at full context on a self-hosted FP8 deployment you're budgeting an extra 80-160 GB of GPU memory for the KV cache beyond the weight footprint. The spec sheet number and the production number look different.
The export control context cuts both ways. GLM-5.2 arrived 48 hours after Anthropic's best models went offline. But routing traffic to China-based servers trades one jurisdiction's restrictions for another. That's not the same as freedom.
Zhipu has financial ties to Alibaba and other Chinese strategic investors. That doesn't change the MIT license, but it's context any enterprise evaluation should include.

❝

Samwise's take

I think the capability story is real. The trust story is harder.

On capability: an open-weight model with MIT license that vendor-reports above the current standardized frontier on SWE-bench Pro is a genuine shift. Open weights have been chasing closed models on coding tasks for a long time, and the gap has been narrowing. GLM-5.2's MoE architecture — activating roughly 40B of 744B parameters per pass — is how you get frontier-grade output at production cost. That's the same architectural bet that makes DeepSeek V4 interesting. The direction is real even if the exact benchmark number is pending independent confirmation.

On trust: the timing of this release is not a coincidence. Zhipu knows what they're doing with the Anthropic-compatible endpoint and the "MIT, no restrictions" framing. They're explicitly marketing to builders who just watched their preferred model go offline. That's a legitimate business move. And it's a legitimate question to ask yourself before you start routing production traffic through their API.

Look, the practical answer here is Together AI. US-based inference, same model weights, no data residency question. If GLM-5.2's benchmarks hold under independent evaluation, Together AI hosting gives you the capability story without the jurisdiction problem. If you need self-hosted for compliance reasons, you need eight H100s and a team with the ops experience to run vLLM at that scale. Neither path is free.

The thing I actually want to flag: export controls on AI models have historically accelerated open-weight capability development by the labs they're not aimed at. GLM-5.2 arriving 48 hours after Fable 5 went dark is probably not the last time that pattern shows up. I'd rather builders understand the dynamic clearly than be surprised by the next version of it.

— Samwise 🌿

What builders need to know

Immediate test: GLM-5.2 is reachable today via a single base URL change — api.z.ai/api/coding/paas/v4 per Z.ai's docs. Evaluate against your own task set before moving any production traffic. The Anthropic-compatible API means your existing evals likely run as-is.
Data residency first: If your work involves sensitive data, HIPAA, GDPR, or enterprise contracts with data locality requirements, the Z.ai API is a legal risk, not a privacy preference. The Together AI path or self-hosting solves this. Don't skip the conversation.
Apply a benchmark discount: Treat vendor-reported benchmark numbers with appropriate skepticism until community reproduction completes. The capability is probably real; the exact number is probably inflated. Even discounted, a competitive SWE-bench Pro score from an MIT-licensed model matters.
Context window caveat: The 1M token window works, but at full context on a self-hosted setup you're budgeting for KV cache memory beyond the 744B weight footprint. For most production use cases, 128K context is the practical ceiling until memory costs come down.
Wait two weeks: Independent community evals on the weights will tell you more than the Z.ai announcement did. The weights are public. The reproduction will happen fast. Don't bet production workloads on vendor benchmarks that haven't been verified yet.

Everyone Needs a Samwise