Hardware-Aware Computation Reorganization for Memory Intensive Applications

Restricted (Penn State Only)
Author:
Kislal, Orhan Memduh
Graduate Program:
Computer Science and Engineering
Degree:
Doctor of Philosophy
Document Type:
Dissertation
Date of Defense:
February 22, 2018
Committee Members:
  • Mahmut Taylan Kandemir, Dissertation Advisor
  • Mahmut Taylan Kandemir, Committee Chair
  • Kamesh Madduri, Committee Member
  • John Morgan Sampson, Committee Member
  • Conrad S Tucker, Outside Member
Keywords:
  • memory
  • performance
  • compiler
  • approximate computing
Abstract:
After hitting the power wall, the dramatic change in computer architecture from single core to multicore/manycore brings us new challenges on high performance computing, especially for data intensive applications. Data access costs dominate the execution times of most parallel applications and they are expected to be even more important in the future. Under these circumstances, the organization of data and computation across available resources becomes a major effect on the performance of the overall system. This dissertation explores the reorganization problem from a hardware-aware perspective to fully harness the underlying architecture and demonstrates various methods to improve the memory performance. These methods span both both domain-specific solutions for some memory-intensive kernels of high importance as well as domain-agnostic optimization techniques. This dissertation approaches the problem of reorganization from two different perspectives. While the traditional methods of organization for data and computation, namely mapping and scheduling, remain highly influential and beneficial; we also evaluate the idea of approximate computing in this context and reorganize data and computation based on their predicted importance. Our exploration includes following steps. On the domain-specific side; we apply mapping, scheduling and data layout reorganization techniques to the sparse matrix vector multiplication problem. In addition, we improve the k-means clustering algorithm with computation reordering as well as multiple skipping heuristics, and propose a cache skipping module for data mining algorithms and explore its benefits with recursive partitioning algorithms. On the domain-agnostic side; we explore location aware and data movement aware computation reorganization techniques, as well as, a code slicing technique that skips high-cost and low-importance data accesses. Our detailed experiments show significant improvements in all cases, up to 25% for domain-specific optimizations and up to 18% for domain-agnostic techniques.