Open Access
Kesten, Tuba
Graduate Program:
Computer Science and Engineering
Master of Science
Document Type:
Master Thesis
Date of Defense:
December 01, 2014
Committee Members:
  • Chitaranjan Das, Thesis Advisor
  • Mahmut Taylan Kandemir, Thesis Advisor
  • GPU
  • multiple application
  • concurrent execution
  • CUDA
  • Hyper-Q
  • Kepler
Graphical Processor Units (GPUs) have a widespread usage in diverse areas such as manufacturing, research, health, life sciences, engineering etc. to accelerate general-purpose computation. To achieve better speedups in general-purpose computation, the available resources are increasing for each new generation of GPUs. In practice, most GPU applications in high-performance computing cannot effectively utilize all of the resources in the system due the lack of enough thread-level parallelism. This underutilization hurts the overall performance in terms up speedup and throughput. To fully exploit the capabilities of GPUs, concurrent execution of applications is critical. NVIDIA’s recent GPU architecture Kepler achieves concurrency management by Hyper-Q technique which assigns a separate work queue to each application. We first propose a flexible and Hyper-Q like multi-application framework which is capable of simulating 2-application and 3-application workloads. Our framework supports 25 applications and 300 of 2-application workloads chosen from Parboil, Rodinia, SHOC and CUDA application suites. Our framework provides programmers to adapt their CUDA code for concurrent execution with little programming effort. Further, we study the application interference of multi-application workloads for different core partitioning schemes in this work. We characterize applications from our application suite based on Misses-Per Kilo Instruction (MPKI) values. We evaluate the 2-application workloads constructed from these applications using various performance metrics: Weighted Speedup (WS/STP), IT, Average Normalized Turnaround Time (ANTT), Fairness Index (FI), and Bandwidth Utilization. The experiments show that MPKI is not enough to analyze the interaction among applications, and the attained bandwidth of each application also needs to be taken into consideration.