Key Highlights
- ✓744B total parameters with only 40B active via MoE — massive efficiency
- ✓86.0% on GPQA-Diamond — beats Claude 3.7 on graduate-level reasoning
- ✓MIT license — fully open weights, deploy anywhere
- ✓~$1/M input tokens — roughly 15x cheaper than Claude Opus
- ✓Built entirely on Huawei Ascend chips — zero NVIDIA dependency
- ✓200K context window with DeepSeek Sparse Attention
The Open-Weights Inflection Point
For years, the AI industry operated on a simple assumption: the best models would always be proprietary. OpenAI, Anthropic, and Google would pour billions into training runs, and open-source would always be a generation behind.
GLM-5 doesn't just challenge that assumption — it demolishes specific corners of it.
What GLM-5 Actually Is
Released February 11, 2026, GLM-5 is a 744 billion parameter mixture-of-experts model from Zhipu AI, the Beijing-based lab that rebranded its international operations as Z.ai. Only 40 billion parameters activate per inference — making it dramatically more efficient than its raw parameter count suggests.
The model was trained on 28.5 trillion tokens using Huawei Ascend chips and the MindSpore framework. That last detail matters: this is the first frontier-class model built entirely outside the NVIDIA ecosystem.
Where It Wins
GLM-5 doesn't beat proprietary models everywhere. But where it wins, it wins decisively:
Graduate-level reasoning: 86.0% on GPQA-Diamond, surpassing Claude 3.7 (84.8%) and crushing GPT-4.5 (71.4%). This is the benchmark that tests PhD-level science questions.
Coding: 90.0% on HumanEval and 77.8 on SWE-bench Verified — putting it ahead of Gemini 3.0 Pro on real-world software engineering tasks.
Agentic work: State-of-the-art on BrowseComp and Vending Bench 2, the office productivity benchmark. This model can actually do things, not just generate text.
Where It Doesn't
MMLU-Pro tells a different story: 70.4% versus Claude 3.7's 84.0% and GPT-4.5's 86.1%. Broad general knowledge remains a proprietary advantage. And at 200K context, it's behind Claude's 1M window — though most practical use cases don't need more than 200K.
The Price Signal
At roughly $1.00 per million input tokens and $3.20 per million output tokens, GLM-5 sits in value territory that makes proprietary models look increasingly hard to justify for many workloads. That's roughly 15x cheaper than Claude Opus for input and 23x cheaper for output.
The real leverage is local deployment. Under MIT license, any organization can run GLM-5 on their own infrastructure — though you'll need serious hardware. Full precision requires ~1.5TB of VRAM; quantized versions need 300GB+.
Why It Matters
GLM-5 represents three converging signals:
1. Open weights reaching parity on specific, high-value benchmarks 2. Non-NVIDIA training infrastructure proving viable at frontier scale 3. Chinese AI labs competing directly with US frontier models despite chip restrictions
For builders and enterprises evaluating their AI stack, GLM-5 changes the calculus. Not because it's better than Claude or GPT at everything — it isn't. But because it's better at enough things, at a low enough cost, with enough openness, to force a serious conversation about vendor lock-in.
