A Workload Mapping Method For Multicore systems Using Cross-run Statistics

Open Access
Author:
Aktasoglu, Mahmut Sami
Graduate Program:
Computer Science and Engineering
Degree:
Master of Science
Document Type:
Master Thesis
Date of Defense:
July 06, 2012
Committee Members:
  • Mahmut Taylan Kandemir, Thesis Advisor
Keywords:
  • Chip Multiprocessors
  • run-time systems
  • application mapping/scheduling
  • cache optimization
Abstract:
Multicore architectures have become the design of choice in today’s microprocessor market. Due to the significant strides made in process technology and poor scalability of the monolithic design, multicore processors are used in more and more devices nowadays. Multicore processors provide applications with a variety of resources in a single chip such as caches, memory controllers, etc. It is not a trivial task, however, to utilize such resources among applications efficiently due to dynamic and unpredictable nature of destructive interference among applications. Multicore systems and operating systems (OS) offer a limited number of tools to manage on-chip resources. Some hardware solutions existing in today’s multicore architectures are Dynamic Voltage/Frequency Scaling (DVFS) and capability of switching the cores on/off. Software solutions offered by operating systems are limited to use underlying architecture’s methods and scheduling. Scheduling in multicore systems is a means of providing applications and threads access to system resources. OS schedulers not only decides when to dispatch application threads, but also maps them to specific cores which defines the resources available for the thread. Therefore, application mapping is a key method for mapping resources to applications on systems level. To better utilize resources where multiple application threads are running, resource-application mapping problem has to be addressed. In this work, we address the problem by presenting a resource-application mapping method using cross-run statistics. Our work is unique in the sense that our method handles various architectures and workload sizes while exploiting the data collected from prior executions of the applications, i.e., cross-run statistics. We present an algorithm which uses these statistics to decide the mapping between applications and system resources to improve overall performance. Our results collected on commercial machines show that our scheme can improve overall system performance by up to 20% over the default OS scheduler.