System Level Power and Reliability Modeling

Open Access
Author:
Lin, Ing-Chao
Graduate Program:
Computer Science and Engineering
Degree:
Doctor of Philosophy
Document Type:
Dissertation
Date of Defense:
February 23, 2007
Committee Members:
  • Vijaykrishnan Narayanan, Committee Chair
  • Mary Jane Irwin, Committee Member
  • Yuan Xie, Committee Member
  • W Kenneth Jerkins, Committee Member
  • Nagu Dhanwada, Committee Member
Keywords:
  • System-on-Chip
  • SoC
  • TLM
  • Device Degradation Modeling
  • Power Modeling
  • Negative Bias Temperature Instability
  • Hot Carrier Effect
  • NBTI
  • Reliability Modeling
  • HCE
  • Transaction level modeling
  • Bus-based
  • PCI Express
Abstract:
This thesis provides system level modeling for power, reliability, and device degradation. In the system level power modeling, we use transaction level modeling. Transaction level modeling (TLM) represents the communications of IP cores as transactions and provides higher simulation speed than lower level of abstraction. We construct a hierarchical power modeling tree and augment the transaction level models with power estimation functions. We demonstrate the power estimation methodology on PCI Express transaction level models, and create various scenarios and validate the methodology on IBM CoreConnect platform. We also present experimental results to validate the accuracy and speed of our approach. In the system level reliability modeling, we propose a transaction-based error susceptibility model for a bus-based System-on-Chip system. This reliability model provides a detailed analysis of different kinds of errors and the susceptibility of such systems to such errors on various components that comprise the bus. We inject single and multi-bit error during the execution of various transactions and examine the effect of the errors. Experimental results demonstrate error susceptibility of signals are similar across the benchmarks. Such transaction-based analysis helps us to develop an effective prediction methodology to predict the effect of a single and multi-bit error on any application running on a bus-based architecture. We demonstrate that our transaction-based prediction scheme works with an average accuracy of 91% over all the benchmarks when compared with the actual simulation results. In the system level modeling for device degradation, we explore how Negative Bias Temperature Instability (NBTI) and Hot Carrier Effects (HCE) cause device degradation in the system. We discuss the tool we developed: a HCE and NBTI Incorporated Tool for ASICs (HANITA), for the complete analysis of circuit degradation. The tool analyzes the degradation impact on bus systems and the vulnerability of buses to such circuit degradation. We propose a hardware-based mechanism to detect the timing degradation and we further propose a PROactive BUS (PROBUS) architecture that dynamically adapts to retain the system functionality even after the system timing degrades.