HARDWARE SOFTWARE CO-DESIGN FOR OPTIMIZING MEMORY HIERARCHY IN MANY-CORE AND MULTI-SOCKET SYSTEMS

Restricted (Penn State Only)
Author:
Kotra, Jagadish Babu
Graduate Program:
Computer Science and Engineering
Degree:
Doctor of Philosophy
Document Type:
Dissertation
Date of Defense:
August 22, 2017
Committee Members:
  • Mahmut Taylan Kandemir, Dissertation Advisor
  • Mahmut Taylan Kandemir, Committee Chair
  • Mary Jane Irwin, Committee Member
  • Kamesh Madduri, Committee Member
  • Dinghao Wu, Outside Member
Keywords:
  • Hardware-software co-design
  • memory hierarchy
  • manycore processors
  • memory
  • caches
Abstract:
Thanks to Moore’s law, the number of transistors on a chip have been increasing over time without increasing area of the processing die. The increased number of transistors are being invested in separate cores instead of optimizing the already complex out-of-order cores to ensure the power-density ie., the heat dissipated per unit area is not too high. Hence, the complex uni-core systems have paved way in to multi- and many-core systems on a processor die of necessarily the same size, thereby resulting in increased amount of processing per unit area. Similar to the processing-end, the number of transistors on the memory side have also increased (though not at the same rate), resulting in the increased memory (DRAM) capacity over the years. Such increased number of transistors at the processor- and memory-ends have enabled significant computation and memory capacity scalings over time in the same area. However, the speed-ups observed due to the increased processing power were not linear. This was because the number of pins that connect the processor and memory haven’t been increased as that would make the die size bigger. As a result, with the increased number of cores, the effective memory bandwidth per computation core decreased over time. Apart from the reduced memory bandwidth per core, the increased memory density (capacity per unit area) resulted in interesting performance and power ramifications in DRAM. Due to the volatile nature of the DRAM, the increased memory density warranted more number of rows to be refreshed in effectively the same retention time. As a result, certain sections of DRAM remained inaccessible to continuously feed the data in to the processing elements resulting in reduced overall memory bandwidth as well. As a result, the performance-gap between the processor and memory have increased significantly over time. This gap in performance between processor and memory is widely referred to by the researchers as “memory-wall”. In my thesis, I have proposed various techniques to bridge the performance-gap between the processor and memory. The techniques I have proposed can be broadly be classified in to: 1. Entirely hardware-based proposals, 2. Entirely software-based proposals, and 3. Hardware-Software based co-design proposals.