ANALYSIS2025-12-02·Tom's Hardware / TSMC / GUC

HBM Architectural Shakeup: HBM4, HBM4E, C-HBM4E — 3nm Base Dies Enable 2.5x Performance

Anton Shilov

COMPILED NOTES

HBM base dies move from DRAM to 3nm logic (TSMC N3P) — enables 2.5x bandwidth (3 TB/s per stack), 2x channels, C-HBM4E adds custom base dies with near-memory compute

HBM Architectural Shakeup: HBM4, HBM4E, and C-HBM4E

Lead

High Bandwidth Memory is undergoing its biggest architectural shift since introduction: base dies move from DRAM processes to logic processes (TSMC 12FFC/N5/N3P), the interface doubles to 2,048 bits, and per-stack bandwidth reaches 3 TB/s. C-HBM4E introduces custom base dies with optional near-memory compute — "potentially the biggest shift in how computers work in decades."

HBM4: The Standard Foundation

2,048-bit interface (doubled from prior generations).
8 GT/s official, 10+ GT/s in implementations, 12.8 GT/s demonstrated (Cadence PHY).
2 TB/s bandwidth per stack at 12 GT/s.
32 channels per stack (doubled concurrency).
4-Hi to 16-Hi stacks, up to 64 GB.
Base dies on TSMC 12FFC or N5 — 2× more power-efficient than HBM3E DRAM-based base dies.
Operating voltage 0.75–0.8V (vs 1.1V HBM3E).

HBM4E: Performance Scaling

Metric	HBM3E	HBM4E	Improvement
Bandwidth	1.2 TB/s	3 TB/s	2.5×
Speed/pin	9.4 Gbps	12 Gbps	1.3×
I/O width	1,024	2,048	2×
Channels	16	32	2×
Power efficiency	baseline	1.7×	—
Area efficiency	baseline	1.8×	—

Advanced HBM4E variants use N5 or N3P base dies.

C-HBM4E: Customization Framework

Standard HBM4E devices with custom base dies. Three levels:

Logic integration — custom logic/caches on base die, standard HBM4E interface retained.
Custom D2D interface — HBM4E memory controller on the logic base die; custom die-to-die PHY reduces trace requirements; more HBM stacks per SoC without package expansion. Manufactured on TSMC N3P.
Near-memory compute (NMC) — basic processing inside memory devices; requires topology-aware software, runtime, compiler, OS changes for heterogeneous memory domains.

Roadmap

Variant	Availability	Status
HBM4	2026	GUC PHY taped out on N3P (Mar 2025); silicon validation Q1 2026
HBM4E	2026–2027	In development
C-HBM4E	2026–2027	In development

Manufacturers & Key Players

DRAM: Micron (high-volume HBM4 for NVIDIA Vera Rubin), SK Hynix, Samsung. Design/IP: TSMC (base dies), GUC (PHY), Rambus (controller / C-HBM4E guidance), Cadence (12.8 GT/s PHY), Siemens EDA, Synopsys.

Integration With NVIDIA Rubin

Rubin Ultra GPU = 1 TB HBM4E.
8 HBM4 stacks → potential 16 TB/s bandwidth per accelerator.
64 GB stacks expected post-late 2027, aligning with HBM4E adoption.
Custom D2D interfaces enable memory subsystems with 1 TB capacity and 48 TB/s bandwidth.
NMC "unlocks potentially the biggest shift in how computers work in decades."

Power & Security

VDDQ range 0.679–0.963V (vendor-specific binning).
VDDC options 0.97V or 1.07V.
Directed Refresh Management (DRFM) mitigates row-hammer attacks.

Software Implications for NMC

Programming models need extensions for in-memory execution.
OS must support heterogeneous memory domains with non-uniform latency.
Profilers must observe execution inside memory devices.
Runtime schedulers need explicit knowledge of bank structure and channel placement.

Why This Matters

HBM is the AI memory wall. Moving base dies from DRAM processes to TSMC N3P logic lets memory evolve at the cadence of logic — 2.5× bandwidth and near-memory compute within two product cycles. This is the most under-appreciated frontier in AI hardware because it reframes memory from a passive bottleneck to an active compute substrate.

Source: HBM undergoes major architectural shakeup, Anton Shilov, Tom's Hardware, Dec 2 2025.

RELATED · IN THE BASE

Lead

HBM4: The Standard Foundation

2,048-bit interface (doubled from prior generations).

8 GT/s official, 10+ GT/s in implementations, 12.8 GT/s demonstrated (Cadence PHY).

2 TB/s bandwidth per stack at 12 GT/s.

32 channels per stack (doubled concurrency).

4-Hi to 16-Hi stacks, up to 64 GB.

Base dies on TSMC 12FFC or N5 — 2× more power-efficient than HBM3E DRAM-based base dies.

Operating voltage 0.75–0.8V (vs 1.1V HBM3E).

Metric

HBM3E

HBM4E

Improvement

Bandwidth

1.2 TB/s

3 TB/s

2.5×

Speed/pin

9.4 Gbps

12 Gbps

1.3×

I/O width

1,024

2,048

2×

Channels

2×

Power efficiency

baseline

1.7×

—

Area efficiency

baseline

1.8×

—

C-HBM4E: Customization Framework

Standard HBM4E devices with custom base dies. Three levels:

Logic integration — custom logic/caches on base die, standard HBM4E interface retained.

Custom D2D interface — HBM4E memory controller on the logic base die; custom die-to-die PHY reduces trace requirements; more HBM stacks per SoC without package expansion. Manufactured on TSMC N3P.

Near-memory compute (NMC) — basic processing inside memory devices; requires topology-aware software, runtime, compiler, OS changes for heterogeneous memory domains.

Variant

Availability

Status

HBM4

2026

GUC PHY taped out on N3P (Mar 2025); silicon validation Q1 2026

HBM4E

2026–2027

In development

C-HBM4E

2026–2027

In development

Integration With NVIDIA Rubin

Rubin Ultra GPU = 1 TB HBM4E.

8 HBM4 stacks → potential 16 TB/s bandwidth per accelerator.

64 GB stacks expected post-late 2027, aligning with HBM4E adoption.

Custom D2D interfaces enable memory subsystems with 1 TB capacity and 48 TB/s bandwidth.

NMC "unlocks potentially the biggest shift in how computers work in decades."

Why This Matters