## The Reasoning Leap Nobody Expected Google DeepMind launched Gemini 3.1 Pro on February 19, 2026, and quietly retook the AI crown on the metric that matters most: novel reasoning. The headline number: 77.1% on ARC-AGI-2 — a benchmark that tests a model’s ability to solve logic patterns it has never seen during training. For context, Gemini 3 Pro scored 31.1%. The improvement isn’t incremental. It’s a doubling in one generation. ## The Price That Didn’t Change Here’s what makes this interesting: Gemini 3.1 Pro costs exactly what Gemini 3 Pro cost. $2 per million input tokens. $12 per million output tokens. Zero price increase for a model that more than doubled its reasoning capability. This is the frontier model convergence thesis in action. Claude Opus 4.6 scores 75.6% on SWE-Bench with 1M context. GPT-5.4 scores 83% on GDPval. Gemini 3.1 Pro scores 77.1% on ARC-AGI-2. All three are hitting near-parity at different price points on different benchmarks. The models are converging. The differentiation is shifting from model quality to ecosystem, developer experience, and pricing. ## What 77.1% on ARC-AGI-2 Means ARC-AGI-2 isn’t a standard benchmark. It tests whether a model can generalize to entirely new problems — patterns it cannot have memorized from training data. A high score here suggests something closer to genuine reasoning than pattern matching. Google describes the improvement as coming from architectural changes to how the model processes multi-step logic, combined with a 1M token context window that lets it hold more working state during complex reasoning chains. ## The Convergence Landscape | Model | Key Benchmark | Score | Input Price | |-------|--------------|-------|-------------| | Gemini 3.1 Pro | ARC-AGI-2 | 77.1% | $2/M | | Claude Opus 4.6 | SWE-Bench | 75.6% | $15/M | | GPT-5.4 | GDPval | 83.0% | $2.50/M | | GPT-5.4 Mini | SWE-Bench Pro | 54.4% | $0.75/M | The takeaway: if you’re building on these models, the API provider matters less than your architecture. The moat has shifted from model quality to tooling, ecosystem, and deployment patterns. That’s why Google acquired the Antigravity team, OpenAI bought Astral (uv, ruff), and Anthropic acquired Bun. The runtime is the new battleground. ## Why Google Wins on Reasoning-Per-Dollar At $2 per million input tokens with 77.1% ARC-AGI-2 reasoning, Gemini 3.1 Pro offers the best reasoning-to-cost ratio of any frontier model. Claude Opus 4.6 costs 7.5x more per input token. GPT-5.4 costs 25% more. For reasoning-heavy agent workloads where you need multiple inference passes, Gemini’s pricing advantage compounds. The enterprise play: Google doesn’t need Gemini to be the best model. It needs Gemini to be good enough at the right price, running on Google Cloud, integrated with Workspace. The strategy mirrors what we saw with Microsoft Copilot Cowork — the model is the engine, but the distribution is the moat.