EXTRACTING BETTER PERFORMANCE FROM THE PARALLELISM OFFERED BY SSDS

Open Access
Author:
Elyasi, Nima
Graduate Program:
Computer Science and Engineering
Degree:
Doctor of Philosophy
Document Type:
Dissertation
Date of Defense:
February 14, 2019
Committee Members:
  • Anand Sivasubramaniam, Dissertation Advisor
  • Anand Sivasubramaniam, Committee Chair
  • Mahmut Taylan Kandemir, Committee Member
  • Chitaranjan Das, Committee Member
  • Qian Wang, Outside Member
  • Changho Choi, Special Member
Keywords:
  • Solid State Drive (SSD)
  • Scheduling
  • Data Replication
  • Tail Latency
  • Parallelism
  • Performance
Abstract:
The majority of growth in the industry is driven by massive data processing which in turn is driving a tremendous need for high performance storage. To satisfy the lower latency demands, large-scale computing platforms have been making heavy use of flash-based Solid State Drives (SSDs), which provide substantially lower latencies compared to conventional hard disk drives. To satisfy the capacity and bandwidth demands, SSDs continue to scale by (i) adopting new flash technologies such as V-NAND, and (ii) embodying more flash chips which leads to higher levels of internal parallelism. With the tremendous growth in SSDs capacity and the continuing rise in their internal hardware parallelism, load imbalance and resource contention remain serious impediments towards boosting their performance. Employing and exploiting higher levels of internal parallelism in SSDs can even accentuate the load imbalance as variable-sized requests span more number of flash chips and impose more complexities to the request schedulers when coordinating the individual queues. On the other hand, the widely differential latency of the basic flash operations: read, write, and erase, exacerbates the load imbalance since not all chips are necessarily doing the same operation at the same time. As a consequence of such unbalanced system, SSD requests experience considerable inefficiencies in terms of non-uniformity and non-determinism in their service, which can in turn impair the profitability of client-facing applications. In this dissertation, remedies to alleviate these challenges are proposed, developed and evaluated. The proposed performance-enhancement mechanisms can be incorporated in the device firmware to provide faster and more consistent service. In this dissertation, we address the load imbalance problem by (i) exploiting the variation in the queue lengths to better take advantage of offered hardware parallelism while serving SSD requests, and (ii) balancing the load across different flash chips by opportunistically re-directing the load to less busy flash chips. First, we propose and develop a scheduling mechanism which orchestrates SSD requests at the (i) arrival time: when inserting requests in their respective queues, and (ii) service time: when issuing requests on flash chips. The former estimates and leverages the skews in completion of sub-requests in order to allow the new arrivals to jump ahead of the existing ones without affecting their response times. The latter, however, aims at achieving time sharing of available resources by coordinatedly scheduling sub-requests of each request at the service time. Such schedulingmechanismsaretargetedatreducingtheresponsetimeofSSDrequestsby coordinating SSD requests in their respective queues. Apart from such optimizations –which are restricted in terms of flash chips servicing a request, one can attempt to re-direct requests to other flash chips which are opportunistically free. Providing this re-direction opportunity is nontrivial for read requests. In the second part of this dissertation, we propose novel approaches to re-direct the load to less busy flash chips. With our proposed techniques, re-direction of read requests is achieved by (i) selective replication, wherein the value popularity is leveraged to replicate the popular data on multiple flash chips and provide more opportunities for read re-direction; and (ii) leveraging existing RAID groups in which flash chips are organized, and reconstructing the read data from the remaining chips of the group, rather than waiting for a long latency operation to complete. While both read redirection approaches aim at exploiting the existing redundancy to improve response times of SSD requests, the former technique is designed for a content-addressable SSD, while the latter is applicable to normal SSDs with RAID-like capabilities which are more common in high-end SSDs employed in large-scale production environments.