A Study of DRAM Optimization to Break the Memory Wall

Zhang, Tao

A Study of DRAM Optimization to Break the Memory Wall

Open Access

Author:: Zhang, Tao
Graduate Program:: Computer Science and Engineering
Degree:: Doctor of Philosophy
Document Type:: Dissertation
Date of Defense:: March 06, 2014
Committee Members:: Yuan Xie, Dissertation Advisor/Co-Advisor
Yuan Xie, Committee Chair/Co-Chair
Mary Jane Irwin, Committee Member
Vijaykrishnan Narayanan, Committee Member
Zhiwen Liu, Committee Member
Raj Acharya, Committee Member
Keywords:: DRAM
Memory Wall
3D-stacked DRAM
Wide IO
Activation
Precharge
Refresh
Sub-array Level Parallelism
Abstract:: The well-known “Memory Wall” has been raised in 1990s. At that time, the researchers noticed the diverging exponential increase in the performance of processor and main memory and thus claimed that the main memory would eventually become the bottleneck of the entire computing system. Furthermore, benefiting from the semiconductor process scaling, the number of transistors in a single chip keeps growing up. As a result, the processor enters multi-/many-core era and the instruction level parallelism (ILP) and thread level parallelism (TLP) have been extensively exploited. The improved parallelism requires the memory to provide low latency, high bandwidth and low power consumption. Unfortunately, as the de facto main memory technology, the evolution of DRAM is relatively slow due to the poor scalability and the extremely high sensitivity of design cost (cost per bit). To this end, we are hitting the “Memory Wall”. To break the memory wall, enormous research work has been proposed to optimize the DRAM architecture so that a better trade-off among performance, power and area overhead can be achieved. Some of the previous proposals, however, are difficult to be implemented because of either the unaffordable design overhead or the unacceptable performance degradation. Moreover, certain new issues have shown up along with the evolution of DRAM. For example, the performance impact of refresh cannot be ignored anymore as it can significantly degrades the DRAM performance. The power consumption of DRAM is also critical as up to 40% power is consumed by the DRAM modules, which are massively populated in the data centers. Therefore, in the DRAM realm it still needs lots of research efforts to make sure DRAM can win the war against the “Memory Wall”. This is the motivation of this dissertation. In this dissertation, the author proposes several novel DRAM architectures, which aims at a better trade-off among DRAM performance, power, and design overhead. Both traditional DRAM technologies and the emerging 3D-stacked DRAMs are covered in this work. To relieve the refresh penalty in the commodity DRAM, a concurrent refresh aware memory (CREAM) is proposed. CREAM allows refresh and memory access to be issued in parallel. On the other hand, a new precharge policy, Lazy Precharge, is introduced to minimize the precharge overhead. By leveraging the Lazy Precharge, multiple activations can share a precharge so that the number of precharge can be significantly reduced. Furthermore, Half-DRAM is proposed to achieve power reduction and performance improvement simultaneously. In Half-DRAM, a bank is redesigned so that it is able to activate only half of a row to reduce the activation and precharge power. In this way, Half-DRAM can easily eliminate the power constraints and thus unleash the performance. Meanwhile, two half-rows can be accessed in parallel to further improve the performance once the sub-array level parallelism is deployed. In addition to the traditional 2D DRAM, a novel 3D Wide IO DRAM architecture is proposed to increase the DRAM parallelism in Wide IO. To take advantage of the increasing wiring resource in the vertical dimension, a bank is further split into multiple sub-banks. Each sub-bank can provide the full cacheline. On the other hand, a 3D SoC chip has been taped out to demonstrate the feasibility of 3D-stacked DRAM. In the chip, a 3D-stacked DRAM directly stacks on a two-layer logic chip. The experimental results show that the proposed optimizations can either effectively improve DRAM performance or significantly reduce DRAM power with negligible area overhead.

Tools