Open Access
Li, Feihui
Graduate Program:
Computer Science and Engineering
Doctor of Philosophy
Document Type:
Date of Defense:
April 25, 2007
Committee Members:
  • Mahmut Taylan Kandemir, Committee Chair
  • Mary Jane Irwin, Committee Member
  • Yuan Xie, Committee Member
  • Kenan Unlu, Committee Member
  • Network-on-Chip
  • Chip Multiprocessor
  • cache
  • optimization
  • power
  • performance
When semiconductor technology scales into the deep sub-micro regime, billions of transistors can pack into a single chip. It turns out that traditional monolithic processor architectures scale poorly with technology due to diminishing improvements in clock rates and the increasing interconnect delay. Such architectures cannot efficiently transform the fertile on-chip resources into computing capability. Chip Multiprocessors (CMPs), integrating multiple relatively simple processing cores on a single chip, are becoming the trend for microprocessor design, as witnessed by both industry and academia. Processors, interconnection networks, and memories constitute the three major components of a CMP architecture. This thesis optimizes two of these components, namely, interconnection network and memory subsystem. When the number of processing nodes of CMPs scales up, a new type of interconnection network, Network-on-Chip (NoC), is normally employed. Thus, we study the emerging interconnection network for CMPs: NoC, and a critical component of the memory subsystem for CMPs: the on-chip, level-2 (L2) Non-Uniform Cache Architecture (NUCA). Targeting these components, this thesis proposes a set of hardware and software optimization schemes. The first part of this thesis uses compiler-directed approaches to reduce the energy consumption of NoCs. Three compiler approaches are proposed, including proactive communication link turn-on/off, compiler-directed voltage selection for communication links, and profiledriven message rerouting. The experimental results with array/loop-intensive applications demonstrate that the compiler-directed approaches are more efficient in reducing the NoC energy consumption than pure hardware-based power management schemes. The second part of this thesis targets the design of high-performance L2 NUCA design and optimization for CMPs. The contribution of this part includes both a novel 3D NoC-bus hybrid NUCA design and a migration-based NUCA design. We demonstrate, through extensive experiments, that the 3D circuit technology is quite efficient in shortening the wire delay and thus reduces the L2 NUCA access latency. The other NUCA proposal is a careful migration scheme (eviction-triggered migration and access-triggered migration), aiming at finding a proper physical location for each cache line in L2. The experimental results show that this scheme generates significant improvements in L2 cache performance. Overall, this thesis demonstrates that it is possible to reduce power consumption and improve performance of NoC-based CMPs through hardware and software directed optimization schemes.