IMPROVING COST-EFFICACY IN PUBLIC CLOUDS: OPTIMAL CONTROL SUBJECT TO CLOUD-TENANT INTERACTIONS

Open Access
Author:
Wang, Cheng
Graduate Program:
Computer Science and Engineering
Degree:
Doctor of Philosophy
Document Type:
Dissertation
Date of Defense:
September 12, 2016
Committee Members:
  • Bhuvan Urgaonkar, Dissertation Advisor
  • Bhuvan Urgaonkar, Committee Chair
  • Qian Wang, Committee Member
  • Anand Sivasubramaniam, Committee Member
  • Vinayak V Shanbhag, Outside Member
  • George Kesidis, Committee Member
  • Qian Wang, Dissertation Advisor
Keywords:
  • Public cloud
  • Tenant
  • Cost-efficacy
Abstract:
In recent years there has been a growing trend of tenant workload needs in public cloud platforms, which may lead to an increasingly competitive cloud market wherein the cloud providers will be forced to operate their data centers at significantly higher utilization levels than seen today. As a negative outcome, public cloud providers may find increased occurrence of periods when tenant needs exceed available resources, which may hurt both the cloud's and the tenants' profitability. In the traditional cluster computing and private cloud setting, since the tenants have full control over the computing resources provisioned by the cloud, the cloud and the tenants could collaborate to optimize the cost-efficacy of the overall eco-system. However, when moving to public clouds, the cloud provider and its tenants become selfish entities that interact with each other but only focus on their own profitability; the existence of multi-tenancy further complicates the situation. It is thus essential to revisit the prior work of both private and public cloud settings and the emerging technologies in public clouds, and explore effective ways of improving both entities' cost-efficacy. On the cloud side, on the one hand, there have been proposals to minimize the cloud's operational costs via various control knobs. However, despite these efforts, it remains a challenging problem due to (i) the complexity of different control knobs and (ii) the complexity of utility pricing, which could combine to result in optimal control problem formulations that are computationally intractable and difficult to cast and update upon changes. On the other hand, utility providers usually take two canonical approaches for incentivizing appropriate customer behaviors as well as maximizing their profitability: (i) dynamic pricing and (ii) dynamic capacity modulation. When adapting to public clouds, new challenges arise due to the cloud's lack of knowledge of the tenants' demands and responses, and the idiosyncrasies of tenants' computing resource needs, which are entirely different from that of the traditional utility providers. On the tenant side, public clouds already offer a variety of procurement options, catering to the needs of a growing and diverse body of tenant workloads. The tenant is now confronted with a dizzying array of resource procurement options with tradeoffs of price dynamism vs. capacity dynamism vs. scaling granularity, which implies great cost-saving potential but may also result in tenant procurement problems that grow exponentially in size. In particular, Amazon EC2 spot instance is a representative virtual machine (VM) offering that is provisioned with such a price dynamism vs. capacity dynamism tradeoff. Prior works have usually employed simple statistical models for price dynamism prediction of spot instances for tractability concerns. However they neglect the key features of spot instance prices that could be leveraged to further improve the tenant's service contiguity and application performance. Therefore, constructing a scalable yet effective solution is of great importance for cost-effective tenant resource procurement and is one of the main focuses of this dissertation. This dissertation develops solutions for improving both the cloud's and the tenants' cost-efficacy in a public cloud eco-system subject to cloud-tenant interactions. The dissertation is composed of two main parts. In the first part, we focus on the cloud-side control problem: (a) We investigate the problem of optimizing the cloud's operational costs in the face of the scalability limitations of such optimal control problems. We design a hierarchical framework that employs temporal aggregation and spatial (control knob) abstraction to achieve scalability and an ease of casting/updating upon changes. We also provide a suite of algorithms within the framework to deal with workloads with different degrees of predictability. (b) We explore the problem of improving the cloud's profitability via explicit dynamic effective capacity modulation and dynamic pricing. We formulate a leader/follower game-based control framework and evaluate the benefits of the proposed approaches using both trace-driven simulation and lab-based prototype system experiments. Our evaluation also provides useful insights for tenants that may choose such public cloud offerings. In the second part, we focus on the tenant-side cost-effective resource procurement. Since the current public cloud providers do not offer VMs with the futuristic properties proposed in this dissertation, we take Amazon EC2 spot instance, on-demand instances and burstable instances as representative VM offerings from public clouds with similar (yet not the same) properties, and explore the potential cost-benefit for tenants that use these options (or a subset of these) collectively. To better assist tenant operations, we identify the key features of spot instance prices and provide scalable data-driven models when applicable. Then we present two real-world tenant case studies incorporating these key features, as well as using these VM options to achieve the desirable cost-performance trade-off. We also adapt existing system techniques and develop software suites, customized according to the specific application properties. To further improve the tenant's cost-efficacy through fine-grained resource scaling, we devise feedback controllers which incorporate the idiosyncrasies of different computing resources. In the evaluation, we show that our proposed approaches can achieve significant cost-savings while satisfying application performance targets.