Improving the Fidelity of Trace-Driven Experiments in Cloud Computing Systems
Open Access
- Author:
- Sajal, Sultan Mahmud
- Graduate Program:
- Computer Science and Engineering
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- April 17, 2024
- Committee Members:
- Chitaranjan Das, Program Head/Chair
Qunhua Li, Outside Unit & Field Member
Timothy Zhu, Co-Chair & Dissertation Advisor
Bhuvan Urgaonkar, Co-Chair & Dissertation Advisor
Siddhartha Sen, Special Member
Ruslan Nikolaev, Major Field Member - Keywords:
- Workload Analysis
Cloud Computing
Performance Evaluation
Analysis of Systems
Testing of Systems
Verification of Systems - Abstract:
- Realistic experimentation is an important part of research in computer systems and of prototyping of new features and ideas in the industry. The ideal way to evaluate is to use real world data and traces collected from production systems and replaying those traces on the representative system and application. However, the real-world data is not always readily suitable for the requirement of the experimental system application. For a lot of use cases, the real-world traffic is collected from a system that is much larger than the experimental system, and directly replaying the traffic in the experimental system will overwhelm the system, causing overload scenarios and exponentially large queueing time. Hence, there is a need for downscaling the load of the trace data for it to be suitable to be replayed in the experimental setup. On the other hand, in many cases, the experiments require a higher load of traffic than that is present in the trace. In these cases, it is required to upscale the trace to artificially increase the load in the trace. At the same time, practitioners do not always get access to the production system, rather use their experimental testbed for experimentation and prototyping. This poses various challenges in creating realistic experimental setup using trace data collected from another setup. One big downside is that, it becomes impossible to create the exact environment at which the trace was collected, thus lacking in proper evaluation of the system. In this thesis, we address these issues related to trace-driven systems research by exploring the existing practices and their potential pitfalls, and then proposing novel techniques in mitigating those in order to facilitate more realistic experiments in systems research. The first part of this thesis (Chapter 3) explores the common trace downscaling techniques used in practice and identifies their potential shortcomings when downscaling traces. This describes TraceSplitter, our novel downscaling technique that downscales traces while main- taining important trace and latency characteristics by using existing load balancing techniques for downscaling traces. Through extensive experiments using real-world and synthetic traces in a social media web application, we demonstrate how the existing techniques can fail in preserving trace characteristics in downscaled traces and how TraceSplitter is superior to those with respect to that. We also demonstrate the real-world implication of this via a case study that calculates the required over-provisioning in autoscaling systems, showing how wrong downscaling can lead to erroneous conclusions about the systems. In the second part of this thesis (Chapter 4), we describe the problem of trace upscaling and the popular trace upscaling techniques employed in systems research. We explore the scenarios where these existing approaches fall short, and propose a novel trace upscaling tool, TraceUpscaler, that overcomes those shortcomings and does a realistic upscaling of traces, producing upscaled traces with higher loads while maintaining the trace characteristics. The key idea behind TraceUpscaler is to repeat each timestamp according to the scaling factor, and at the same time, use the same request parameters from the original trace. Through rigorous experiments with real-world and synthetic traces in stateful and stateless systems, we demonstrate that existing upscaling techniques fail to preserve important trace characteristics, and TraceUpscaler is much superior at it than the rest. Finally, the last part of this thesis (Section 5.2.1) addresses the challenge of recreating the production system environment for realistic experimentation in the context of microservice architecture. This advocates for a new framework for collecting traces and replaying it during experimentation to measure the performance of the service(s) of interest in isolation from the variability in the rest of the system. The framework consists of a new trace collecting technique that minimizes engineering efforts and storage requirement and creates models of the rest of the system for realistic recreation of the trace collection environment.