Exploring Power Reliability Tradeoffs in On-Chip Networks

Open Access
Yanamandra, Aditya
Graduate Program:
Computer Science and Engineering
Doctor of Philosophy
Document Type:
Date of Defense:
June 18, 2010
Committee Members:
  • Mary Jane Irwin, Dissertation Advisor
  • Mary Jane Irwin, Committee Chair
  • Vijaykrishnan Narayanan, Committee Chair
  • Mahmut Taylan Kandemir, Committee Member
  • Padma Raghavan, Committee Member
  • Vittaldas V Prabhu, Committee Member
  • Electromigration
  • Soft Errors
  • Reliability
  • Network on chip
  • Router Cache
  • Iso-reliable
The past decade has seen a shift in the arms race of the microprocessor industry from increasing the clock frequency to increasing the number of cores on chip. Power and performance considerations have motivated the adoption of chip multiprocessors (CMPs). As a result, there is an emphasis on the performance of the on-chip interconnection network that connect these cores. Bus based designs dominated the interconnection network in the industry so far. They are early indications of buses being replaced with a packet based Network-on-Chip (NoC) due to the scalability issues of on-chip buses. Reliability concerns such as soft errors, process variation and premature transistor failure threaten future technology scaling. NoCs are beset by the same challenges as well. The distributed nature of the NoC provides for a different approach for solving these issues. In this dissertation, we address several phenomena that cause errors in the Network on Chip domain. We also address the reliability impact of several techniques proposed in the literature to improve the performance and power of the NoC. NoC power consumption has been identified as a chief limiter to adoption in the industry. Providing reliability costs area and power on the chip. For example, error correction codes (ECC) are employed to mitigate the effect of soft errors on NoC. Such power requirements place an additional burden on the chip designers. In this dissertation, we propose techniques to reduce the power required to maintain iso-reliable systems. Reducing the power required to maintain reliable NoCs is an important factor in achieving the goal of highly reliable low-power NoCs. Techniques proposed in this dissertation aim to achieve such a system.