Inside the NVIDIA Vera Rubin Platform

Tech Report

Kyle AubreyNVIDIAJanuary 5, 2026

Key Contribution

Six-chip co-designed AI supercomputer platform: 50 PFLOPS FP4 inference, 288GB HBM4 at 22TB/s, 5x improvement over Blackwell

Inside the NVIDIA Vera Rubin Platform

Abstract

NVIDIA's Vera Rubin platform represents the next generation of AI compute infrastructure, treating the data center — not a single GPU — as the unit of compute. The platform comprises six co-designed chips (seven with the later addition of Groq 3 LPX) engineered for integrated operation at rack scale.

Key Contributions

Rubin GPU delivers 50 PFLOPS NVFP4 inference and 35 PFLOPS training — a 5x and 3.5x improvement over Blackwell respectively
First architecture to use HBM4 memory: 288GB per GPU at 22 TB/s bandwidth (2.8x over Blackwell's 8 TB/s)
NVLink 6 provides 3.6 TB/s bidirectional bandwidth per GPU (2x Blackwell), with 260 TB/s aggregate in an NVL72 rack — more than the entire global internet
Vera CPU with 88 custom Olympus cores (Arm), 1.5TB LPDDR5X at 1.2 TB/s, coherent CPU-GPU link at 1.8 TB/s
336 billion transistors per Rubin GPU (up from 208B on Blackwell)
Rack-scale confidential computing with third-generation trusted execution

Architecture Details

Six Core Chips

Vera CPU — 88 custom Olympus cores, 176 threads via Spatial Multithreading, 162MB unified L3 cache, PCIe Gen6 with CXL 3.1
Rubin GPU — 224 SMs, fifth-gen Tensor Cores optimized for NVFP4/FP8, expanded special function units for attention/activation/sparse compute
NVLink 6 Switch — 36 switches per NVL72 rack, in-network SHARP FP8 acceleration (14.4 TFLOPS per tray), hot-swappable trays
ConnectX-9 — 800 Gb/s per port, 1.6 Tb/s quad SuperNIC per tray, 800 Gb/s inline cryptography
BlueField-4 DPU — 64-core Grace CPU, 800 Gb/s networking, 20M IOPs NVMe storage, ASTRA trust architecture
Spectrum-6 Ethernet — 102.4 Tb/s per switch, co-packaged silicon photonics (64x signal integrity improvement)

Performance vs. Blackwell

Metric	Blackwell	Rubin	Improvement
NVFP4 Inference	10 PFLOPS	50 PFLOPS	5x
NVFP4 Training	10 PFLOPS	35 PFLOPS	3.5x
HBM Bandwidth	8 TB/s	22 TB/s	2.8x
NVLink per GPU	1.8 TB/s	3.6 TB/s	2x
Transistors	208B	336B	1.6x
HBM Capacity	192 GB	288 GB	1.5x

NVL72 Rack Specs

72 Rubin GPUs with all-to-all NVLink topology
260 TB/s aggregate scale-up bandwidth
Per tray: 200 PFLOPS, 14.4 TB/s NVLink, 2TB fast memory
Rack power: 180-220 kW (fully liquid-cooled)
Cableless modular trays using Paladin HD2 connectors (assembly: 5 min vs 2 hours)

Target Workloads

Long-context inference (100K+ tokens)
Mixture-of-Experts models with dynamic routing
Agentic reasoning pipelines
Continuous training/post-training
Multi-tenant, multi-model execution
10x lower cost per token vs Blackwell for MoE inference
Train MoE models with 4x fewer GPUs

Deployment Timeline

CES 2026: Architecture announced, full production confirmed
H2 2026: Systems shipping to customers
March 2026 update: Vera Rubin POD announced with seventh chip

Limitations

Extreme power density (180-220kW per rack) requires purpose-built liquid cooling infrastructure
Co-packaged silicon photonics for Spectrum-6 is cutting-edge and may face yield challenges at scale
Premium pricing — the "extreme co-design" strategy deepens vendor lock-in vs. open standards

Source: Inside the NVIDIA Vera Rubin Platform by Kyle Aubrey, NVIDIA

Inside the NVIDIA Vera Rubin Platform

Inside the NVIDIA Vera Rubin Platform

Abstract

Key Contributions

Architecture Details

Six Core Chips

Performance vs. Blackwell

NVL72 Rack Specs

Target Workloads

Deployment Timeline

Limitations

Tags