Content Addressable Data Management

Open Access
- Author:
- Nath, Partho
- Graduate Program:
- Computer Science and Engineering
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- February 26, 2007
- Committee Members:
- Anand Sivasubramaniam, Committee Chair/Co-Chair
Bhuvan Urgaonkar, Committee Member
Padma Raghavan, Committee Member
Qian Wang, Committee Member
Michael A Kozuch, Committee Member
Raj Acharya, Committee Member - Keywords:
- content addressable storage
content addressable parallel file system
internet suspend resume - Abstract:
- A direct implication of both the industry and academia proclaiming the Age of Tera-(even the Peta)-scale computing, is that applications have become more data intensive than ever. The increased data volume from applications tackling larger and larger problems has fueled the need for efficient management of this data. In this thesis, we evaluate a technique called Content Addressable Storage or CAS, for managing large volumes of data. This evaluation focuses on the benefits and demerits of using CAS for, i) improved application performance via lockless and lightweight synchronization of accesses to shared storage data; ii) improved cache performance; iii) increase in storage capacity; and, iv) increased network bandwidth. We present the design of a CAS-based file store that significantly improves the storage performance providing lightweight and lock-less user-defined consistency semantics. As a result, our file-system shows a 28% increase in read-bandwidth and a 13% increase in write bandwidth, over a popular file-system in common use. We use the same experimental file-system to analyze CAS on data from real world application benchmarks. We also estimate the potential benefits of using CAS for a virtual machine based user mobility application, that was in active use at a public deployment for over a period of seven months.