ORCHESTRATING THE COMPILER AND MICROARCHITECTURE FOR REDUCING CACHE ENERGY
Open Access
- Author:
- Hu, Jie
- Graduate Program:
- Computer Science and Engineering
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- June 29, 2004
- Committee Members:
- Mahmut Taylan Kandemir, Committee Member
Mary Jane Irwin, Committee Member
Vijaykrishnan Narayanan, Committee Chair/Co-Chair
Yuan Xie, Committee Member
Richard R Brooks, Committee Member - Keywords:
- Power-Aware Systems Design
Energy-Efficient Cache Architectures
Computer Architecture
Compiler Optimization
Characterization
Algorithm
Performance - Abstract:
- Cache memories are widely employed in modern microprocessor designs to bridge the increasing speed gap between the processor and the off-chip main memory, which imposes the major performance bottleneck in computer systems. Consequently, caches consume a significant amount of the transistor budget and chip die area in microprocessors employed in both low-end embedded systems and high-end server systems. Being a major consumer of on-chip transistors, thus also of the power budget, cache memory deserves a new and complete study of its performance and energy behavior and new techniques for designing cache memories for next generation microprocessors. This thesis focuses on developing compiler and microarchitecture techniques for designing energy-efficient caches, targeting both dynamic and leakage energy. This thesis has made four major contributions towards energy efficient cache architectures. First, a detailed cache behavior characterization for both array-based embedded applications and general-purpose applications was performed. The insights obtained from this study suggest that (1) different applications or different code segments within a single application have very different cache demands in the context of performance and energy concerns, (2) program execution footprints (instruction addresses) can be highly predictable and usually have a narrow scope during a particular execution phase, especially for embedded applications, (3) high sequentiality is presented in accesses to the instruction cache. Second, a technique called compiler-directed cache polymorphism (CDCP) was proposed. CDCP is used to analyze the data reuse exhibited by loop nests, and thus to extract the cache demands and determine the best data cache configurations for different code segments to achieve the best performance and energy behavior. Third, this thesis presents a redesigned processor datapath to capture and utilize the predictable execution footprint for reducing energy consumption in instruction caches. Finally, this thesis work addresses the increasing leakage concern in the instruction cache by exploiting cache hotspots during phase execution and the sequentiality exhibited in execution footprint.