Processing-In-Memory (PIM) & Memory-Centric Computing

Active Frontier

processing-in-memorymemory-centric-computingdata-movementenergy-efficiencyrowhammerrowpressself-managing-dramjedecai-memory-wallpim

Processing-In-Memory (PIM) & Memory-Centric Computing

Processing-In-Memory reframes the oldest assumption in computing: that memory is passive and the CPU does the work. In modern AI and consumer workloads, 60–90% of total system energy is data movement, not computation — a DRAM access costs 800× a floating-point op, and up to 64,000× when sensors and storage are included. Meanwhile, state-of-the-art data-center processors spend 80–90% of their time waiting for memory. The implication: the AI compute shortage is substantially a memory-movement shortage, and the correct fix is not more cores but computation at the data.

Two commercial trajectories are converging on this idea. Near-memory approaches (HBM4E/C-HBM4E with logic on the base die, UPMEM-style bank-level processors, Samsung HBM-PIM, SK Hynix AiM) integrate conventional logic adjacent to memory. Using-memory approaches (RowClone, Ambit) exploit the analog behavior of DRAM cells to compute within the array — bulk copy via consecutive row activations, bitwise AND/OR/NOT/majority via concurrent multi-row activation. Critically, researchers (SAFARI / ETH Zurich, Mutlu group) have shown several of these ops run reliably on unmodified, commodity DRAM by violating nominal timing parameters.

Reliability scaling is forcing intelligence into memory regardless of whether the industry wants PIM. RowHammer, RowPress, and column-disturbance mechanisms make modern DRAM a physical-security liability without on-die logic. DDR5 already embeds activation counters; this is the first rung on a ladder that ends at Self-Managing DRAM — memory that schedules its own refresh and defenses and can signal "not now" to the CPU.

The bottleneck is no longer physics — it's paradigm. JEDEC (~390 companies) rarely converges on radical interface changes; the "Self-Managing DRAM" paper was rejected six times over three and a half years. Onur Mutlu frames the full shift as a "Copernican Revolution" likely to take decades. Near-term, the investable surface is narrower: HBM4E/C-HBM4E base-die compute, commercial PIM DRAM (HBM-PIM, AiM, UPMEM), CXL 3.x composable memory, and hyperscaler topology-aware software stacks that can exploit any of it.

Key Claims

60–90% of system energy is data movement across consumer apps (Chrome, video codecs, TF inference) and ML workloads (LSTMs, transducers). Evidence: moderate — single-researcher framing, needs independent corroboration (Mutlu synthesis)
80–90% of data-center processor time is spent waiting on memory. Evidence: moderate, attributed to Google (Mutlu synthesis)
DRAM access ≈ 800× FP op, 6,400× int add, 64,000× including storage/sensors. Evidence: moderate (Mutlu synthesis)
RowClone and Ambit operate on unmodified DRAM by violating nominal timing parameters. Evidence: moderate, demonstrated in lab (Mutlu synthesis)
HBM4E base dies on TSMC N3P enable C-HBM4E near-memory compute — one of three C-HBM4E customization levels. Evidence: strong (HBM4 Shakeup)
DDR5 embeds activation counters for RowHammer defense. Evidence: strong — ratified spec (Mutlu synthesis)
RowPress induces bit flips with orders-of-magnitude fewer activations than RowHammer. Evidence: moderate — recent finding (Mutlu synthesis)
UPMEM acquired by Qualcomm (June 2025) — RISC-V DPU cores embedded in standard DDR4/DDR5 DIMMs; up to 259× speedup for large-batch MLP inference documented. Evidence: strong — acquisition confirmed by PitchBook and CB Insights; technical architecture documented in multiple arXiv surveys (Mutlu synthesis)
Samsung HBM-PIM (Aquabolt-XL) — 2.5× system performance, 60% energy reduction measured on Xilinx Virtex Ultrascale+ (Alveo) AI accelerator. Evidence: strong — Samsung + Hot Chips 33 disclosed (Mutlu synthesis)
SK Hynix AiMX card (32 GB, GDDR6-AiM) ran Llama 2 70B at Hot Chips 2024 / AI HW Summit 2024; 1.25 V operating voltage vs 1.35 V standard → ~80% data-movement power reduction. Evidence: strong — SK Hynix technical disclosure (Mutlu synthesis)
RowPress documented on commodity DDR4 — bit flips with orders-of-magnitude fewer activations than RowHammer. Evidence: strong — arXiv:2406.16153 (Luo et al., SAFARI) (Mutlu synthesis)
Self-Managing DRAM framework for in-DRAM autonomous operations — appeared at MICRO 2024 (Yaglikci, Luo, Mutlu). Evidence: strong — peer-reviewed venue (Mutlu synthesis)
RowHammer paper won 2024 Jean-Claude Laprie Award for dependable computing — signals field-wide recognition of the reliability angle. Evidence: strong (Mutlu synthesis)

Two Approaches Compared

Dimension	Processing Near Memory (PNM)	Processing Using Memory (PUM)
Where compute lives	Logic layer in 3D-stack / on HBM base die / per DRAM bank	Inside the DRAM cell array itself
Logic used	Conventional ALUs, processors	Row activations, charge sharing
Examples	C-HBM4E NMC, HBM-PIM, AiM, UPMEM	RowClone, Ambit
Workload fit	General-purpose + ML kernels	Bulk memcpy, bitwise ops, RNG
Productization status	Commercial (Samsung, SK Hynix, UPMEM) + roadmapped (C-HBM4E 2026–27)	Research + unmodified-DRAM demonstrations
Programming model	Firmware + runtime extensions; CUDA-adjacent	New primitives; ISA-level changes required
Key barrier	Topology-aware software, yield, heat	JEDEC interface, determinism guarantees

Energy Arithmetic

Operation	Relative Energy (vs 64-bit FP op)
64-bit FP multiply-add	1×
32-bit integer add	~0.1×
DRAM read/write (access)	800×
DRAM access vs 32-bit int add	6,400×
DRAM + storage + sensor chain	~64,000×

Reliability Pressure Forcing Memory Intelligence

Mechanism	What it is	Status
RowHammer	Repeated row activations leak charge into adjacent rows → bit flips	Exploited in the wild; DDR5 activation counters partially mitigate
RowPress	Long-held row activation induces flips with orders-of-magnitude fewer activations	Recently discovered; no standardized defense yet
Column disturbance	Thousands of rows affected simultaneously	Newly identified
DDR5 activation counters	On-die logic triggers adjacent-row refresh	Shipping
Self-Managing DRAM	Memory schedules own refresh/defense, can defer CPU requests	Research; Mutlu paper accepted after 6 rejections

Commercial Signals to Track

UPMEM → Qualcomm acquisition (June 2025, confirmed) — first pure-play PIM exit to a major semi. Tech: RISC-V DPU cores per DDR4/DDR5 DIMM. Watch for post-deal product announcements integrating UPMEM into Qualcomm's AI-PC / edge-AI / data-center stack.
Samsung HBM-PIM (Aquabolt-XL) — HBM2-based, validated at 2.5× perf and 60% energy cut on Xilinx Alveo. No confirmed hyperscaler production deployment yet — watch for HBM3/HBM4-PIM successor products as Samsung scales HBM capacity +50% in 2026.
SK Hynix AiMX (GDDR6-AiM) — 32 GB card running Llama 2 70B demoed at Hot Chips 2024; 80% data-movement power savings. Watch for LPDDR6-AiM variant targeting on-device AI (CES 2026 roadmap hint).
C-HBM4E near-memory compute — level-3 customization in the HBM4E roadmap; hyperscaler topology-aware stacks are the gating item. NVIDIA Rubin Ultra (1 TB HBM4E, 8 stacks) is the natural first host.
CXL 3.x composable memory — memory disaggregation sets the interface conditions for memory-centric design at rack scale.
JEDEC DDR6 spec drafts — watch for activation counter evolution and any hint of CPU-memory interface flexibility (Self-Managing DRAM primitives would be the breakthrough signal).
RowPress and column-disturbance exploits in the wild — first published CVE will force a JEDEC response and validate the "reliability forces intelligence" thesis.

Investment Implications (preliminary)

Reframes NVIDIA's moat from FLOPs to memory-system integration (NVLink, Grace–Hopper coherence, rack-scale memory bandwidth).
Elevates memory makers — SK Hynix, Samsung Memory, Micron — from capacity suppliers to active-compute participants as C-HBM4E NMC and HBM-PIM/AiM gain traction.
Fabric & interface layer (Astera Labs, Marvell, Broadcom, Rambus, Synopsys/Cadence IP) is the arbitrage seat as CXL 3.x and custom D2D interfaces proliferate.
Standards-body dynamics are a genuine alpha source: JEDEC activity is a forward indicator 3–5 years out.
Pure-play PIM is thin — UPMEM was the notable independent; post-acquisition this is mostly a "hidden inside an incumbent" story.

Open Questions

Do the 60–90% data-movement energy figures replicate on independent hyperscaler workloads (Meta, Microsoft benchmarks)?
Does C-HBM4E NMC find a production workload beyond vector-DB / recommendation serving, or does it stay a niche accelerator?
What's the realistic 3-year PIM TAM — is it a 5% sidecar to HBM, or a 30%+ reshaping of memory ASP?
Do hyperscaler custom stacks (TPU, Trainium, MTIA, Maia) get first or last access to C-HBM4E topology-aware software?
Does a CXL 3.x-native memory-centric reference architecture emerge, or does it stay a loose constellation of vendor-specific features?
What's the attacker's-view timeline on RowPress exploits in the wild, and does it force a JEDEC spec revision?

Related Concepts

HBM4 Memory Architecture — C-HBM4E near-memory compute is the first commercial wedge of PIM into mainstream AI accelerators
Custom Silicon vs GPU — memory-system integration is where ASICs either catch NVIDIA or don't
Rack-Scale AI Compute — rack-as-product design lets hyperscalers ship topology-aware memory software

Backlinks

Pages that reference this concept:

Changelog

2026-04-21 — Initial compilation from Mutlu (ETH Zurich / SAFARI) synthesis. Cross-linked to HBM4 concept since C-HBM4E NMC is the commercial bridge.
2026-04-21 — Primary-source pass: UPMEM/Qualcomm acquisition confirmed (June 2025); Aquabolt-XL numbers (2.5× / 60%) and AiMX/Llama 2 70B demo added; RowPress (arXiv:2406.16153) and Self-Managing DRAM (MICRO 2024) cited; RowHammer paper's 2024 Jean-Claude Laprie Award noted. Evidence levels upgraded accordingly.

Related Concepts

Theses that depend on this concept

These research positions cite this concept in their evidence. If the concept changes materially, these theses may need re-scoring.

The AI compute shortage is substantially a memory-movement shortage

7.0/10Medium-high . The directional claim is overwhelmingly supported (energy measurements, HBM margin structure, C-HBM4E roadmap, Qualcomm's UPMEM acquisition). The *magnitude* claim (60–90%) remains single-source in its specific framing and deserves independent benchmark replication

7.0/10

Last reviewed 2026-04-21

Sources

memory-centric-computing-mutlu hbm4-c-hbm4e-architectural-shakeup

Processing-In-Memory (PIM) & Memory-Centric Computing

Active Frontier

processing-in-memorymemory-centric-computingdata-movementenergy-efficiencyrowhammerrowpressself-managing-dramjedecai-memory-wallpim

Processing-In-Memory (PIM) & Memory-Centric Computing

Key Claims

60–90% of system energy is data movement across consumer apps (Chrome, video codecs, TF inference) and ML workloads (LSTMs, transducers). Evidence: moderate — single-researcher framing, needs independent corroboration (Mutlu synthesis)
80–90% of data-center processor time is spent waiting on memory. Evidence: moderate, attributed to Google (Mutlu synthesis)
DRAM access ≈ 800× FP op, 6,400× int add, 64,000× including storage/sensors. Evidence: moderate (Mutlu synthesis)
RowClone and Ambit operate on unmodified DRAM by violating nominal timing parameters. Evidence: moderate, demonstrated in lab (Mutlu synthesis)
HBM4E base dies on TSMC N3P enable C-HBM4E near-memory compute — one of three C-HBM4E customization levels. Evidence: strong (HBM4 Shakeup)
DDR5 embeds activation counters for RowHammer defense. Evidence: strong — ratified spec (Mutlu synthesis)
RowPress induces bit flips with orders-of-magnitude fewer activations than RowHammer. Evidence: moderate — recent finding (Mutlu synthesis)
UPMEM acquired by Qualcomm (June 2025) — RISC-V DPU cores embedded in standard DDR4/DDR5 DIMMs; up to 259× speedup for large-batch MLP inference documented. Evidence: strong — acquisition confirmed by PitchBook and CB Insights; technical architecture documented in multiple arXiv surveys (Mutlu synthesis)
Samsung HBM-PIM (Aquabolt-XL) — 2.5× system performance, 60% energy reduction measured on Xilinx Virtex Ultrascale+ (Alveo) AI accelerator. Evidence: strong — Samsung + Hot Chips 33 disclosed (Mutlu synthesis)
SK Hynix AiMX card (32 GB, GDDR6-AiM) ran Llama 2 70B at Hot Chips 2024 / AI HW Summit 2024; 1.25 V operating voltage vs 1.35 V standard → ~80% data-movement power reduction. Evidence: strong — SK Hynix technical disclosure (Mutlu synthesis)
RowPress documented on commodity DDR4 — bit flips with orders-of-magnitude fewer activations than RowHammer. Evidence: strong — arXiv:2406.16153 (Luo et al., SAFARI) (Mutlu synthesis)
Self-Managing DRAM framework for in-DRAM autonomous operations — appeared at MICRO 2024 (Yaglikci, Luo, Mutlu). Evidence: strong — peer-reviewed venue (Mutlu synthesis)
RowHammer paper won 2024 Jean-Claude Laprie Award for dependable computing — signals field-wide recognition of the reliability angle. Evidence: strong (Mutlu synthesis)

Two Approaches Compared

Dimension	Processing Near Memory (PNM)	Processing Using Memory (PUM)
Where compute lives	Logic layer in 3D-stack / on HBM base die / per DRAM bank	Inside the DRAM cell array itself
Logic used	Conventional ALUs, processors	Row activations, charge sharing
Examples	C-HBM4E NMC, HBM-PIM, AiM, UPMEM	RowClone, Ambit
Workload fit	General-purpose + ML kernels	Bulk memcpy, bitwise ops, RNG
Productization status	Commercial (Samsung, SK Hynix, UPMEM) + roadmapped (C-HBM4E 2026–27)	Research + unmodified-DRAM demonstrations
Programming model	Firmware + runtime extensions; CUDA-adjacent	New primitives; ISA-level changes required
Key barrier	Topology-aware software, yield, heat	JEDEC interface, determinism guarantees

Energy Arithmetic

Operation	Relative Energy (vs 64-bit FP op)
64-bit FP multiply-add	1×
32-bit integer add	~0.1×
DRAM read/write (access)	800×
DRAM access vs 32-bit int add	6,400×
DRAM + storage + sensor chain	~64,000×

Reliability Pressure Forcing Memory Intelligence

Mechanism	What it is	Status
RowHammer	Repeated row activations leak charge into adjacent rows → bit flips	Exploited in the wild; DDR5 activation counters partially mitigate
RowPress	Long-held row activation induces flips with orders-of-magnitude fewer activations	Recently discovered; no standardized defense yet
Column disturbance	Thousands of rows affected simultaneously	Newly identified
DDR5 activation counters	On-die logic triggers adjacent-row refresh	Shipping
Self-Managing DRAM	Memory schedules own refresh/defense, can defer CPU requests	Research; Mutlu paper accepted after 6 rejections

Commercial Signals to Track

UPMEM → Qualcomm acquisition (June 2025, confirmed) — first pure-play PIM exit to a major semi. Tech: RISC-V DPU cores per DDR4/DDR5 DIMM. Watch for post-deal product announcements integrating UPMEM into Qualcomm's AI-PC / edge-AI / data-center stack.
Samsung HBM-PIM (Aquabolt-XL) — HBM2-based, validated at 2.5× perf and 60% energy cut on Xilinx Alveo. No confirmed hyperscaler production deployment yet — watch for HBM3/HBM4-PIM successor products as Samsung scales HBM capacity +50% in 2026.
SK Hynix AiMX (GDDR6-AiM) — 32 GB card running Llama 2 70B demoed at Hot Chips 2024; 80% data-movement power savings. Watch for LPDDR6-AiM variant targeting on-device AI (CES 2026 roadmap hint).
C-HBM4E near-memory compute — level-3 customization in the HBM4E roadmap; hyperscaler topology-aware stacks are the gating item. NVIDIA Rubin Ultra (1 TB HBM4E, 8 stacks) is the natural first host.
CXL 3.x composable memory — memory disaggregation sets the interface conditions for memory-centric design at rack scale.
JEDEC DDR6 spec drafts — watch for activation counter evolution and any hint of CPU-memory interface flexibility (Self-Managing DRAM primitives would be the breakthrough signal).
RowPress and column-disturbance exploits in the wild — first published CVE will force a JEDEC response and validate the "reliability forces intelligence" thesis.

Investment Implications (preliminary)

Reframes NVIDIA's moat from FLOPs to memory-system integration (NVLink, Grace–Hopper coherence, rack-scale memory bandwidth).
Elevates memory makers — SK Hynix, Samsung Memory, Micron — from capacity suppliers to active-compute participants as C-HBM4E NMC and HBM-PIM/AiM gain traction.
Fabric & interface layer (Astera Labs, Marvell, Broadcom, Rambus, Synopsys/Cadence IP) is the arbitrage seat as CXL 3.x and custom D2D interfaces proliferate.
Standards-body dynamics are a genuine alpha source: JEDEC activity is a forward indicator 3–5 years out.
Pure-play PIM is thin — UPMEM was the notable independent; post-acquisition this is mostly a "hidden inside an incumbent" story.

Open Questions

Do the 60–90% data-movement energy figures replicate on independent hyperscaler workloads (Meta, Microsoft benchmarks)?
Does C-HBM4E NMC find a production workload beyond vector-DB / recommendation serving, or does it stay a niche accelerator?
What's the realistic 3-year PIM TAM — is it a 5% sidecar to HBM, or a 30%+ reshaping of memory ASP?
Do hyperscaler custom stacks (TPU, Trainium, MTIA, Maia) get first or last access to C-HBM4E topology-aware software?
Does a CXL 3.x-native memory-centric reference architecture emerge, or does it stay a loose constellation of vendor-specific features?
What's the attacker's-view timeline on RowPress exploits in the wild, and does it force a JEDEC spec revision?

Related Concepts

HBM4 Memory Architecture — C-HBM4E near-memory compute is the first commercial wedge of PIM into mainstream AI accelerators
Custom Silicon vs GPU — memory-system integration is where ASICs either catch NVIDIA or don't
Rack-Scale AI Compute — rack-as-product design lets hyperscalers ship topology-aware memory software

Backlinks

Pages that reference this concept:

Changelog

2026-04-21 — Initial compilation from Mutlu (ETH Zurich / SAFARI) synthesis. Cross-linked to HBM4 concept since C-HBM4E NMC is the commercial bridge.
2026-04-21 — Primary-source pass: UPMEM/Qualcomm acquisition confirmed (June 2025); Aquabolt-XL numbers (2.5× / 60%) and AiMX/Llama 2 70B demo added; RowPress (arXiv:2406.16153) and Self-Managing DRAM (MICRO 2024) cited; RowHammer paper's 2024 Jean-Claude Laprie Award noted. Evidence levels upgraded accordingly.

Related Concepts

Custom Silicon vs GPU

Active Frontier

custom-siliconasicgpu+3

HBM4 Memory Architecture

hbm4hbm4ec-hbm4e+5

Rack-Scale AI Compute

Active Frontier

rack-scaleco-designdata-center+2

Theses that depend on this concept

These research positions cite this concept in their evidence. If the concept changes materially, these theses may need re-scoring.

The AI compute shortage is substantially a memory-movement shortage

7.0/10

Last reviewed 2026-04-21

Sources

memory-centric-computing-mutlu hbm4-c-hbm4e-architectural-shakeup