DESIGN AND IMPLEMENTATION OF ERASURE CODING IN CASSANDRA
Open Access
- Author:
- Ramesh, Ranjitha
- Graduate Program:
- Computer Science and Engineering
- Degree:
- Master of Science
- Document Type:
- Master Thesis
- Date of Defense:
- June 24, 2021
- Committee Members:
- Chitaranjan Das, Program Head/Chair
Bhuvan Narendra Urgaonkar, Thesis Advisor/Co-Advisor
Viveck Ramesh Cadambe, Thesis Advisor/Co-Advisor - Keywords:
- Erasure Coding
Cassandra
Distributed System
Algorithm - Abstract:
- High Availability and Fault-Tolerance are two important features for any distributed system. To achieve this we often resort to using Replication. We save extra copies of the data at different locations, so that we can fetch the closest copy or in case of failures, we could reach additional copies. Erasure Coding is a space-efficient technique to store data and offer redundancy at the same time. In this work, we are leveraging Erasure coding in a widely used NoSQL distributed database, Cassandra to prove the efficiency and trade-offs. We propose an algorithm that fits Cassandra’s architecture closely and with minimal overheads and minimal changes to existing structure, delivers more than 50% storage efficiency as compared to replication. Our algorithm is simple and flexible providing the users a lot of control on choosing the erasure coding schema, the threshold size of the data to trigger erasure coding, multi-read, multi-write, single-read and single-write of the entries. All this while preserving some major features like sorting, indexing, searching and range queries by the partition and clustering keys. This is a novel approach where we maintain closest latency possible from the original design and provide an improved storage efficiency. There have been lot of work on Erasure coding which have been discussed in the related work section. This is followed by a further detailed discussion of the challenges and benefits of our approach towards Erasure Coding in Cassandra.