HBM Architectural Shakeup: HBM4, HBM4E, C-HBM4E — 3nm Base Dies Enable 2.5x Performance
HBM base dies move from DRAM to 3nm logic (TSMC N3P) — enables 2.5x bandwidth (3 TB/s per stack), 2x channels, C-HBM4E adds custom base dies with near-memory compute
HBM Architectural Shakeup: HBM4, HBM4E, and C-HBM4E
Lead
High Bandwidth Memory is undergoing its biggest architectural shift since introduction: base dies move from DRAM processes to logic processes (TSMC 12FFC/N5/N3P), the interface doubles to 2,048 bits, and per-stack bandwidth reaches 3 TB/s. C-HBM4E introduces custom base dies with optional near-memory compute — "potentially the biggest shift in how computers work in decades."
HBM4: The Standard Foundation
- 2,048-bit interface (doubled from prior generations).
- 8 GT/s official, 10+ GT/s in implementations, 12.8 GT/s demonstrated (Cadence PHY).
- 2 TB/s bandwidth per stack at 12 GT/s.
- 32 channels per stack (doubled concurrency).
- 4-Hi to 16-Hi stacks, up to 64 GB.
- Base dies on TSMC 12FFC or N5 — 2× more power-efficient than HBM3E DRAM-based base dies.
- Operating voltage 0.75–0.8V (vs 1.1V HBM3E).
HBM4E: Performance Scaling
| Metric | HBM3E | HBM4E | Improvement |
|---|---|---|---|
| Bandwidth | 1.2 TB/s | 3 TB/s | 2.5× |
| Speed/pin | 9.4 Gbps | 12 Gbps | 1.3× |
| I/O width | 1,024 | 2,048 | 2× |
| Channels | 16 | 32 | 2× |
| Power efficiency | baseline | 1.7× | — |
| Area efficiency | baseline | 1.8× | — |
Advanced HBM4E variants use N5 or N3P base dies.
C-HBM4E: Customization Framework
Standard HBM4E devices with custom base dies. Three levels:
- Logic integration — custom logic/caches on base die, standard HBM4E interface retained.
- Custom D2D interface — HBM4E memory controller on the logic base die; custom die-to-die PHY reduces trace requirements; more HBM stacks per SoC without package expansion. Manufactured on TSMC N3P.
- Near-memory compute (NMC) — basic processing inside memory devices; requires topology-aware software, runtime, compiler, OS changes for heterogeneous memory domains.
Roadmap
| Variant | Availability | Status |
|---|---|---|
| HBM4 | 2026 | GUC PHY taped out on N3P (Mar 2025); silicon validation Q1 2026 |
| HBM4E | 2026–2027 | In development |
| C-HBM4E | 2026–2027 | In development |
Manufacturers & Key Players
DRAM: Micron (high-volume HBM4 for NVIDIA Vera Rubin), SK Hynix, Samsung. Design/IP: TSMC (base dies), GUC (PHY), Rambus (controller / C-HBM4E guidance), Cadence (12.8 GT/s PHY), Siemens EDA, Synopsys.
Integration With NVIDIA Rubin
- Rubin Ultra GPU = 1 TB HBM4E.
- 8 HBM4 stacks → potential 16 TB/s bandwidth per accelerator.
- 64 GB stacks expected post-late 2027, aligning with HBM4E adoption.
- Custom D2D interfaces enable memory subsystems with 1 TB capacity and 48 TB/s bandwidth.
- NMC "unlocks potentially the biggest shift in how computers work in decades."
Power & Security
- VDDQ range 0.679–0.963V (vendor-specific binning).
- VDDC options 0.97V or 1.07V.
- Directed Refresh Management (DRFM) mitigates row-hammer attacks.
Software Implications for NMC
- Programming models need extensions for in-memory execution.
- OS must support heterogeneous memory domains with non-uniform latency.
- Profilers must observe execution inside memory devices.
- Runtime schedulers need explicit knowledge of bank structure and channel placement.
Why This Matters
HBM is the AI memory wall. Moving base dies from DRAM processes to TSMC N3P logic lets memory evolve at the cadence of logic — 2.5× bandwidth and near-memory compute within two product cycles. This is the most under-appreciated frontier in AI hardware because it reframes memory from a passive bottleneck to an active compute substrate.
Source: HBM undergoes major architectural shakeup, Anton Shilov, Tom's Hardware, Dec 2 2025.