Irregularity-aware Computation and Data Management in Manycore Systems
![open_access](/assets/open_access_icon-bc813276d7282c52345af89ac81c71bae160e2ab623e35c5c41385a25c92c3b1.png)
Open Access
- Author:
- Tang, Xulong
- Graduate Program:
- Computer Science and Engineering
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- June 12, 2019
- Committee Members:
- Mahmut Taylan Kandemir, Dissertation Advisor/Co-Advisor
Mahmut Taylan Kandemir, Committee Chair/Co-Chair
Chita R. Das, Committee Member
John (Jack) Sampson, Committee Member
Dinghao Wu, Outside Member - Keywords:
- GPGPUs
Manycore architecture
Data locality
Irregular applications - Abstract:
- During the past decade, the slow down of scaling in transistor technology has brought the chip design to the ``post-Moore'' era, where integrating more transistors in a single core system can no longer yield performance because of the power wall and the utilization wall. As a revolutionary success, manycore systems have rapidly penetrated into various markets such as desktop, laptop, servers, mobiles, and IoT devices. The amount of resources in those manycore systems are scaling out, forming various parallel systems such as Graphics Processing Units (GPUs), manycore CPUs and heterogeneous datacenters. These systems provide huge computing capability and become the default platforms for different communities such as scientific computing, large-scale data analytics, entertainment and deep learning, where high performance, accuracy, and quality of service are of concern. However, the delivered performance rarely keeps up with the growing amount of resources. This is because of two major challenges. First, the applications' intrinsic irregularity makes them unable to utilize the resources effectively and efficiently. Second, current systems are not able to dynamically and automatically adapt to those application characteristics. Targeting these challenges, this dissertation systematically researches the opportunities existing in the software-hardware stack (i.e., compiler, runtime system, and architecture) with the goal to effectively improve performance and energy-efficiency for applications, especially for those applications with irregular computation and data access patterns. Specifically, this dissertation consists of four parts. First, focusing on irregular applications running on Graphics Processing Units (GPUs), it proposes controlled computation spawning to dynamically improve compute resource utilization and balance computation across parallel computing engines. Second, targeting poor cache performance of irregular applications, it proposes a dynamic runtime approach to exploit data reuse and improving cache locality. Third, focusing on data access parallelism, it proposes a compiler directed approach to improve memory bank-level parallelism. Finally, in addition to memory bank-level parallelism, it proposes co-optimization strategies to maximize cache level parallelism while keeping the memory bank-level parallelism maximized.