Rethinking the memory hierarchy design with nonvolatile memory technologies

Open Access
Zhao, Jishen
Graduate Program:
Computer Science and Engineering
Doctor of Philosophy
Document Type:
Date of Defense:
March 03, 2014
Committee Members:
  • Yuan Xie, Dissertation Advisor
  • Yuan Xie, Committee Chair
  • Mary Jane Irwin, Committee Member
  • Vijaykrishnan Narayanan, Committee Member
  • Zhiwen Liu, Committee Member
  • Onur Mutlu, Special Member
  • Memory hierarchy
  • Nonvolatile memory
  • Persistence
  • Memory/storage stack
  • Energy efficiency
  • Graphics memory
  • CMP
  • GPU
The memory hierarchy, including processor caches and the main memory, is an important component of various computer systems. The memory hierarchy is becoming a fundamental performance and energy bottleneck, due to the widening gap between the increasing bandwidth and energy demands of modern applications and the limited performance and energy efficiency provided by traditional memory technologies. As a result, computer architects are facing significant challenges in developing high-performance, energy-efficient, and reliable memory hierarchies. New byte-addressable nonvolatile memories (NVRAMs) are emerging with unique properties that are likely to open doors to novel memory hierarchy designs to tackle the challenges. However, substantial advancements in redesigning the existing memory hierarchy organizations are needed to realize their full potential. This dissertation focuses on re-architecting the current memory hierarchy design with NVRAMs, producing high-performance, energy-efficient memory designs for both CPU and graphics processor (GPU) systems. The first contribution of this dissertation is to devise a novel bandwidth-aware reconfigurable cache hierarchy with hybrid memory technologies to enhance system performance of chip-multiprocessors (CMPs). In CMP designs, limited memory bandwidth is a potential bottleneck of the system performance. NVRAMs promise high-bandwidth cache solutions for CMPs. We propose a bandwidth-aware reconfigurable cache hierarchy with hybrid memory technologies. With different memory technologies, our hybrid cache hierarchy design optimizes the peak bandwidth at each level of caches. Furthermore, we develop a reconfiguration mechanism to dynamically adapt the cache capacity of each level based on the predicted bandwidth demands of different applications. This dissertation also explores energy-efficient graphics memory design for GPU systems. We develop a hybrid graphics memory architecture, employing NVRAMs and the traditional memory technology used in graphics memories, to improve the overall memory bandwidth and reduce the power dissipation of GPU systems. In addition, we design an adaptive data migration mechanism to further reduce graphics memory power dissipation without hurting GPU system performance. The data migration mechanism exploits various memory access patterns of workloads running on GPUs. Finally, this dissertation discusses how to re-architect the current memory/storage stack with a persistent memory system to achieve efficient and reliable data movements. First, we propose a hardware-based, high-performance persistent memory design. Persistent memory allows in-memory persistent data objects to be updated at much higher throughput than using disks as persistent storage. Most previous persistent memory designs root from software perspective, and unfortunately reduce the system performance to roughly half that of a traditional memory system with no persistence support. One of the great challenges in this application class is therefore how to efficiently enable data persistence in memory. This dissertation proposes a persistent memory design that roots from hardware perspective, offering numerous practical advantages: a simple and intuitive abstract interface, microarchitecture-level optimizations, fast recovery from failures, and eliminating redundant writes to nonvolatile storage media. In addition, this dissertation presents a fair and high-performance memory scheduling for persistent memory systems. This dissertation tackles the problem raised by shared memory interface between memory accesses with and without the persistence requirement. This dissertation proposes a memory scheduling scheme that achieves both fair memory accesses and high system throughput for the co-running applications. Our key observation is that the write operations are also on the critical execution path for persistent applications. This dissertation introduces a new scheduling policy to balance the service of memory requests from various workloads, and a strided logging mechanism to accelerate the writes to persistent memory by augmenting their bank-level parallelism.