Memory-Centric Computing: A Paradigm Shift for Sustainable and Efficient Systems
Argues processor-centric computing is fundamentally broken: 60–90% of system energy is data movement, not compute; DRAM can compute (RowClone, Ambit); reliability threats (RowHammer, RowPress, column disturbance) force memory intelligence anyway; JEDEC + trillion-dollar incumbency is the bottleneck, not technology
Memory-Centric Computing: A Paradigm Shift for Sustainable and Efficient Systems
Executive Summary
The current computing paradigm is fundamentally flawed by its "processor-centric" nature, which treats memory as a passive, "dumb" storage device. This architecture necessitates constant data movement between storage, memory, and processor, producing massive energy waste, performance bottlenecks, and reliability issues. Between 60% and 90% of total system energy in modern workloads — from mobile browsing to large-scale ML — is consumed by data movement across the memory hierarchy.
Transitioning to "memory-centric" computing means (1) enabling memory to perform active computation (Processing-In-Memory / PIM) and (2) letting memory autonomously manage its own maintenance and reliability (Self-Managing DRAM). Orders-of-magnitude improvements in efficiency are technically evident, but three barriers remain: a trillion-dollar processor-centric industry, rigid JEDEC interface standards, and conservative academic/industrial review processes.
The Crisis of Processor-Centric Computing
Modern systems allocate ~95% of hardware real estate to storing and moving data, yet these functions remain the primary bottlenecks.
Performance & Energy Bottlenecks
- The waiting processor: Data from Google indicates state-of-the-art data-center processors spend 80–90% of their time waiting for memory loads; instructions execute only 10–20% of the time.
- Energy disparity: Moving data dwarfs the cost of computing on it.
- DRAM read/write ≈ 800× a 64-bit double-precision FP op.
- DRAM read/write ≈ 6,400× a 32-bit integer add.
- Including storage and sensors, the ratio escalates to ~64,000× the computation's actual energy.
Real-World Workloads (Google joint study)
- Consumer apps: >60% of total system energy in Chrome browsing, video encode/decode, and TensorFlow inference is data movement.
- ML models: >90% of total system energy for LSTMs and transducers is memory + off-chip interconnect.
Reliability & Scaling
Scaling DRAM to smaller nodes increases susceptibility to physical disturbance — forcing intelligence into memory controllers.
- RowHammer — repeated row activations cause charge leakage that flips bits in adjacent rows. Exploited for sandbox breakouts, cryptographic-key corruption, unauthorized access.
- RowPress — keeping a row active for extended periods induces bit flips with orders of magnitude fewer activations than RowHammer.
- Column disturbance — newly identified mechanism affecting thousands of rows simultaneously.
Industry patches: DDR5 now includes activation counters that trigger adjacent-row refreshes. These are patches, not fundamental shifts toward autonomous memory.
The Solution: Processing-In-Memory (PIM)
Processing Near Memory (PNM)
Places traditional logic close to or inside the memory chip (e.g., 3D-stacked memory with logic layer).
- UPM (reportedly acquired by Qualcomm) designs DRAM chips with a general-purpose multi-threaded processor per bank.
- Potential: distributed compute across memory controllers, accelerating LLMs to the point of "GPU-free" systems.
Processing Using Memory (PUM)
Exploits the analog properties of memory cells to compute with minimal additional logic.
- RowClone: consecutive DRAM activates copy data row-to-row within a subarray without CPU involvement — major memcpy energy/latency reduction.
- Ambit: concurrent activation of multiple rows enables bitwise majority / AND / OR / NOT — a "blockwise computation engine" inside DRAM.
- Real-world validation: testing on existing, unmodified DRAM by violating timing parameters demonstrates that copy, logical ops, and true RNG can often be performed reliably with significant throughput gains.
Architectural & Theoretical Shifts
| Concept | Processor-Centric (now) | Memory-Centric (proposed) |
|---|---|---|
| Interface | Rigid; CPU controls all memory timing (e.g., refresh every 7.8µs) | Self-Managing DRAM: memory can signal "no" to CPU during internal tasks (refresh, RowHammer protection) |
| Complexity theory | Big-O on processor operations | Data-centric complexity models that count data movement |
| System roles | CPU/accelerator = master, memory = slave | Distributed system of equal, coordinating agents |
Barriers to Adoption
- Economic: trillion-dollar investment in processor-centric infrastructure creates entrenched dogma.
- Standards: JEDEC committee (~390 companies) rarely converges on radical interface changes.
- Review processes: Mutlu's "Self-Managing DRAM" paper was rejected 6× over 3.5 years before acceptance; revolutionary ideas are dismissed as "commercially unattractive" or requiring full-stack co-design.
- Mindset gap: device-level physics (aging, spatial variation) still under-understood after 60 years of DRAM use; device-level and system-level research remain siloed.
Conclusion
The author frames this as a "Copernican Revolution" taking decades to realize. Sustainability and energy constraints make the status quo increasingly untenable. Treating memory as a combined computation + storage substrate yields orders-of-magnitude efficiency gains — if the industry can overcome systemic and economic resistance to radical architectural change.
Key Numbers
- 80–90% — time data-center processors spend waiting for memory
- 60% — system energy spent on data movement in Chrome, video codecs, TF inference
-
90% — system energy spent on memory/interconnect for LSTMs and transducers
- 800× — DRAM access energy vs 64-bit FP op
- 6,400× — DRAM access energy vs 32-bit int add
- 64,000× — storage+sensor+DRAM access vs actual computation
- 95% — hardware real estate dedicated to storing/moving data
- 7.8µs — current DRAM refresh cadence (CPU-imposed)
- 390 — approximate number of JEDEC member companies
- 6 rejections / 3.5 years — Self-Managing DRAM paper acceptance history
Related Work to Ingest Next
- Mutlu group (SAFARI) arXiv papers: RowClone, Ambit, Self-Managing DRAM (MICRO 2024, Yaglikci/Luo/Mutlu), RowPress (arXiv:2406.16153), related survey (arXiv:2503.16749)
- UPMEM / Qualcomm acquisition primary source ✓ confirmed June 2025 — CB Insights, PitchBook, design-reuse
- Samsung HBM-PIM (Aquabolt-XL) — Hot Chips 33 paper, Samsung Newsroom; validated on Xilinx Alveo at 2.5× perf / 60% energy cut
- SK Hynix AiM — GDDR6-AiM + AiMX 32 GB card; Hot Chips 2024 / AI HW Summit 2024; Llama 2 70B demo
- RowHammer paper — won 2024 Jean-Claude Laprie Award (dependable computing)
- JEDEC DDR5 / DDR6 activation counter specifications
- CXL 3.x composable memory specifications
- PIM-AI architectures survey (arXiv:2411.17309) — LLM inference-specific PIM designs
Verification Pass (2026-04-21)
Cross-checked key claims against public sources; evidence upgraded in the Processing-In-Memory concept page:
- ✅ UPMEM → Qualcomm acquisition confirmed (June 2025)
- ✅ Samsung Aquabolt-XL: 2.5× perf, 60% energy cut on Xilinx Alveo
- ✅ SK Hynix AiMX: 32 GB card, Llama 2 70B, 80% data-movement power saving
- ✅ RowPress: arXiv:2406.16153, commodity DDR4 bit flips
- ✅ Self-Managing DRAM: MICRO 2024, Yaglikci/Luo/Mutlu
- ✅ RowHammer paper: 2024 Jean-Claude Laprie Award
- ⚠️ 60–90% energy figures: still single-source (Mutlu framing of Google joint study); independent hyperscaler replication not yet located
- ⚠️ 800× / 6,400× / 64,000× DRAM vs compute energy ratios: widely cited but upstream source is Horowitz 2014 ISSCC keynote — need to verify whether figures have shifted meaningfully with modern nodes