Application-Aware On-Chip Networks

Open Access
Author:
Das, Reetuparna
Graduate Program:
Computer Science and Engineering
Degree:
Doctor of Philosophy
Document Type:
Dissertation
Date of Defense:
March 26, 2010
Committee Members:
  • Chitaranjan Das, Dissertation Advisor
  • Chitaranjan Das, Committee Chair
  • William Kenneth Jenkins, Committee Member
  • Vijaykrishnan Narayanan, Committee Member
  • Yuan Xie, Committee Member
  • Ravishankar Iyer, Committee Member
  • Onur Mutlu, Committee Member
Keywords:
  • On-chip networks
  • multi-core
  • arbitration
  • prioritization
  • memory systems
  • packet scheduling
  • slack
  • criticality
  • topology
  • compression
  • hierarchical
Abstract:
Multi-hop packet-based Network-on-Chip (NoC) architectures are widely viewed as the de facto solution for integrating the nodes in many-core architecture for their scalability and well-controlled and highly predictable electrical properties. The Network-on-Chip (NoC) has become an important research focus in recent years because the network plays a critical role in determining the performance and power behavior of a many-core architectures. Most of the innovative solutions proposed for NoC research problems focus on independently optimizing the NoC without exploiting characteristics of applications or software stack. This thesis offers a unique perspective of designing high-performance, scalable and energy efficient NoC's by utilizing application characteristics. In this thesis, I show that we can design much superior on-chip networks if we understand application behavior and customize on-chip networks for them. I propose application-aware approaches for packet scheduling in on-chip networks, application communication locality aware hierarchical topologies for NoCs, and data compression techniques which exploit value locality inherent in application data traffic. The first contribution of this thesis is to devise application-aware packet scheduling policies for NoCs. The NoCs are likely to become a critical shared resource in future many-core processors. The challenge is to develop policies and mechanisms that enable multiple applications to efficiently and fairly share the network, to improve system performance. A key component of a router that can influence application-level performance and fairness is the arbitration/scheduling unit. Existing polices for arbitration and packet scheduling in NoCs are local and application oblivious. However, we observe that different application characteristics can lead to differential criticality of packets: some packets will be more important to processor execution time than other packets. This novel insight enables us to design packet scheduling polices to provide high performance in on-chip networks. First, I propose a coordinated application-aware prioritization substrate. The idea is to divide processor execution time into phases, rank applications based on the criticality of network on each application’s performance (or based on system-level application priorities) within a phase, and have all routers in the network prioritize packets based on their applications’ ranks in a coordinated fashion. Our scheme includes techniques that ensure starvation freedom and enable the enforcement of system-level application priorities, resulting in a configurable substrate that enables application-aware prioritization in on-chip networks. Next, I propose a new architecture Aergia, to exploit slack in packet latency. In this thesis, we define slack as a key measure that characterizes the relative importance of a packet. Specifically, the slack of a packet is the number of cycles the packet can be delayed in the network with no effect on execution time. We propose new router prioritization policies that exploit the available slack of interfering packets in order to accelerate performance-critical packets and thus improve overall system performance. When two packets interfere with each other in a router, the packet with the lower slack value is prioritized. I describe mechanisms to estimate slack, prevent starvation, and combine slack-based prioritization with the application-aware prioritization mechanisms proposed above. The second contribution of this thesis is application-aware hierarchical topologies. This proposal leverages the insight that applications mapped on a large CMP system will benefit from clustered communication, where data is placed in cache banks closer to the cores accessing it. Thus, we design a hierarchical network topology that takes advantage of such communication locality. The two-tier hierarchical topology consists of local networks that are connected via a global network. The local network is a simple, high-bandwidth, low-power shared bus fabric, and the global network is a low-radix mesh. Since most communication in CMP applications can be limited to the local network, using a fast, low-power bus to handle local communication will improve both network latency and power-efficiency. The final contribution of this thesis is data compression techniques for on-chip networks. In this context, we examine two different configurations that explore combinations of storage and communication compression: (1) Cache Compression (CC) and (2) Compression in the NIC (NC). We also address techniques to hide the decompression latency by overlapping it with communication latency. We comprehensively characterize and quantify in detail the effect of data compression on NoCs. The attractive benefits seen from our evaluations make a strong case for utilizing compression for optimizing the performance and power envelope of NoC architectures. I also take advantage of compressibility of application data traffic to improve the throughput via novel router microarchitecture, called XShare. The XShare architecture utilizes data value locality and bimodal traffic characteristics of CMP applications to transfer multiple small flits over a single channel.