Temperature-Aware Computing

Open Access
Link, Greg
Graduate Program:
Computer Science and Engineering
Doctor of Philosophy
Document Type:
Date of Defense:
May 22, 2006
Committee Members:
  • Vijaykrishnan Narayanan, Committee Chair
  • Mary Jane Irwin, Committee Member
  • Chitaranjan Das, Committee Member
  • Kenneth Jenkins, Committee Member
  • design automation
  • architecture
  • thermal
  • Temperature
  • hotspot
  • hot spot
In the future, the peak temperature of a chip will be a primary design constraint. Higher temperatures can accelerate various chip failure mechanisms, reducing the lifetime of the system. These high temperatures also place additional burden on cooling systems, which must prevent thermal runaways due to increased standby power consumption. Consequently, temperature must be considered in the earliest phases of the design process. Many existing thermal management techniques focus on reducing the overall power consumption of the chip by throttling performance, eventually resulting in an overall reduction in chip temperature. These techniques, while effective, often do not address location-specific temperature problems referred to as hotspots. Recent research into hotspots has shown that different functional units in general purpose processors can have significantly different temperature profiles, and that moving workloads between units can reduce the creation of hotspots on the die. Using a newly developed thermal analysis tool, HS3d, this work explores the thermal profile of modern processor architectures, and discusses the types and characteristics of hotspots in future technologies, as process variation, multi-core design, and multi-wafer stacking techniques become prevalent. Means of mitigating these hotspots are presented, including workload migration for homogenous architectures, and means of reducing hotspots near the integer ALU. One proposed method, integer offloading to floating-point, redirects integer operations to the floating-point hardware, slightly increasing latency and power consumption, but distributing heat more evenly across the die. . Finally, a model of the impact of temperature on circuit timing is presented, and the impact of temperature gradients on multi-core processors is explored, showing that by the 45nm technology node, thermally-induced timing variations of 5% per 10 degrees C are possible. Traditional worst-case design techniques, which assume a single high temperature for the entire device, can therefore not take full advantage of the much more common typical-case conditions. This thesis discusses how thermal-aware design can be incorporated into the automated design flow, allowing variable frequency systems to achieve maximal performance across a wide operating range.