HBM4 Memory Architecture

Active Frontier
hbm4memory-bandwidthai-computeinference

HBM4 Memory Architecture

HBM4 is the memory breakthrough that unlocks the next generation of AI inference and training. Debuting in NVIDIA's Rubin GPU, HBM4 delivers 22 TB/s bandwidth per GPU — a 2.8x improvement over Blackwell's 8 TB/s HBM3e — with 288 GB capacity (up from 192 GB).

Memory bandwidth has become the primary bottleneck for large model inference. Transformer attention scales quadratically with context length, and mixture-of-experts models require fast access to large parameter sets with dynamic routing. HBM4's bandwidth gains directly translate to higher inference throughput: NVIDIA claims 5x inference improvement over Blackwell for the Rubin GPU, with memory bandwidth being the key enabler alongside architectural improvements in the fifth-generation Tensor Cores.

The capacity increase to 288 GB per GPU matters for long-context workloads and large MoE models. In an NVL72 rack, total fast memory reaches approximately 2 TB per tray (with additional LPDDR5X on the Vera CPU side at 1.5 TB per CPU, 1.2 TB/s bandwidth). This aggregate memory pool enables serving models that previously required multiple racks.

Key Claims

  • 22 TB/s bandwidth per GPU — 2.8x over Blackwell's HBM3e at 8 TB/s. Evidence: strong (NVIDIA Vera Rubin)
  • 288 GB capacity per GPU — 1.5x over Blackwell's 192 GB, enabling larger models in fewer GPUs. Evidence: strong (NVIDIA Vera Rubin)
  • Memory bandwidth is the primary inference bottleneck — HBM4 gains directly enable 5x inference improvement. Evidence: strong (NVIDIA Vera Rubin)
  • Vera CPU memory: 1.5 TB LPDDR5X at 1.2 TB/s — Coherent CPU-GPU link at 1.8 TB/s for unified memory access. Evidence: strong (NVIDIA Vera Rubin)

Benchmarks & Data

MetricHBM3e (Blackwell)HBM4 (Rubin)Improvement
Bandwidth8 TB/s22 TB/s2.8x
Capacity192 GB288 GB1.5x
  • Vera CPU adds 1.5 TB LPDDR5X at 1.2 TB/s per CPU (NVIDIA)
  • Coherent CPU-GPU link at 1.8 TB/s (NVIDIA)

Open Questions

  • What are HBM4 yield rates and cost premiums versus HBM3e?
  • How does HBM4 power consumption compare at the per-GPU and per-rack level?
  • Will HBM4 be available to custom ASIC vendors (TPU, Trainium) at competitive timelines?
  • Does the 288 GB capacity ceiling force model architecture choices, or is it sufficient for 2026-2027 model sizes?

Related Concepts

Backlinks

Pages that reference this concept:

HBM4 Memory Architecture | KB | MenFem