Design and Analysis of Scheduling Techniques for Throughput Processors

Open Access
- Author:
- Jog, Adwait
- Graduate Program:
- Computer Science and Engineering
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- June 12, 2015
- Committee Members:
- Chitaranjan Das, Dissertation Advisor/Co-Advisor
Chitaranjan Das, Committee Chair/Co-Chair
Mahmut Taylan Kandemir, Committee Member
Yuan Xie, Committee Member
William Kenneth Jenkins, Committee Member
Ravishankar Iyer, Special Member
Onur Mutlu, Special Member - Keywords:
- GPUs
Bandwidth
Latency
Parallelism
Memory-Systems
Scheduling
Prefetching - Abstract:
- Throughput Processors such as Graphics Processing Units (GPUs) are becoming an inevitable part of every computing system because of their ability to accelerate applications consisting of abundant parallelism. They are not only used to accelerate big data analytics in cloud data centers or high-performance computing (HPC) systems, but are also employed in mobile and wearable devices for efficient execution of multimedia rich applications and smooth rendering of display. In spite of the highly parallel structure of GPUs and their ability to execute multiple threads concurrently, they are far from achieving their theoretically achievable peak performance. This is attributed to several reasons such as contention for limited shared resources (e.g., caches and memory), high control-flow divergence, and limited off-chip memory bandwidth. Another reason for the low utilization and subpar performance is that the current GPUs are not well-equipped to efficiently and fairly execute multiple applications concurrently, potentially originating from different users. This dissertation is focused on managing contention in GPUs for shared cache and memory resources caused by concurrently executing threads. This contention causes severe loss in performance, fairness, locality, and parallelism. To manage this contention, this dissertation proposes techniques that are employed at two different places:core and memory. First, this dissertation shows that by intelligently scheduling the threads at the core, the generated memory request patterns can be more amenable for existing resource management techniques such as cache replacement and memory scheduling as well as performance enhancement techniques such as data prefetching. Second, this dissertation shows that considering criticality and other application characteristics to schedule memory requests at the memory controller is an effective way to manage contention at the memory.