A Cache Topology Aware Multi-Query Scheduler for Multicore Architectures

Open Access
Orhan, Umut
Graduate Program:
Computer Science and Engineering
Master of Science
Document Type:
Master Thesis
Date of Defense:
July 08, 2011
Committee Members:
  • Mahmut Taylan Kandemir, Thesis Advisor
  • multicore architectures
  • database systems
  • cache topology
As mainstream computer chip architectures are switching from single core machines to multicore ones, it is becoming increasingly important to exploit multicore specific characteristics to extract maximum performance. One of these characteristics is the existence of shared on-chip caches, through which different threads/processes can share data (help each other) or displace each other's data (hurt each other). Most of current commercial multicore systems on the market have on-chip cache hierarchies with multiple layers (typically, in the form of L1, L2 and L3, the last two being either fully or partially shared). In the context of database workloads, exploiting full potential of these caches can be critical. Motivated by this, our main contribution in this work is to present and experimentally evaluate a cache hierarchy-aware query mapping scheme targeting workloads that consist of batch queries to be executed on emerging multicores. Our proposed scheme distributes a given batch of queries across the cores of a target multicore architecture based on the affinity relations among the queries. The primary goal behind this scheme is to maximize the utilization of the underlying on-chip cache hierarchy while keeping the load nearly balanced across affinity domains. Each affinity domain in this context corresponds to a cache structure at a particular level of the cache hierarchy. A graph partitioning-based method is employed for distributing queries across cores, and an integer linear programming (ILP) formulation are employed for addressing locality and load balancing concerns. We evaluate our scheme using the TPC-H benchmarks on two commercial multicore machines with different on-chip cache topologies. Our solution achieves up to 25% improvement in individual query execution times and 15%-19% improvement in throughput over the default Linux-based process scheduler.