Analyzing the Benefits of Scratchpad Memories for Scientific Matrix Computations

Open Access
Author:
Cover, Bryan Alan
Graduate Program:
Computer Science and Engineering
Degree:
Master of Science
Document Type:
Master Thesis
Date of Defense:
April 04, 2008
Committee Members:
  • Mary Jane Irwin, Thesis Advisor
  • Padma Raghavan, Thesis Advisor
Keywords:
  • software-controlled
  • CMP
  • computer architecture
  • cache
  • scratchpad
  • memory
  • matrix multiplication
Abstract:
Scratchpad memories (SPMs) have been shown to be more energy efficient, have faster access times, and take up less area than traditional hardware-managed caches. This, coupled with the predictability of data presence and reduced thermal properties, makes SPMs an attractive alternative to cache for many scientific applications. In this work, SPM based systems are considered for a variety of different functions. The first study performed is to analyze SPMs for their thermal and area properties on a conventional RISC processor. Six performance optimized variants of architecture are explored that evaluate the impact of having an SPM in the on-chip memory hierarchy. Increasing the performance and energy efficiency of both dense and sparse matrix-vector multiplication on a chip multi-processor are also looked at. The efficient utilization of the SPM is ensured by profiling the application for the data structures which do not perform well in traditional cache. The impact of using an SPM at all levels of the on-chip memory hierarchy is evaluated through three SPM based architectures. When looking at the chip layout, the results show an average decrease in the average component temperature of a chip by as much as 1.6%. The total area of a chip can be reduced by nearly 30%. The dense matrix-vector multiplication kernel showed as much as a 60% increase in performance and a decrease in the energy-delay product by as much as 84%. For the sparse matrix-vector multiplication kernel, the experimental results show an increase in performance by as much as 41% and a decrease in the om-chip energy-delay product by as much as 67%. This depends on which level of the hierarchy the SPM is utilized and the specific sparse matrix being multiplied.