SCHEDULING AND RESOURCE MANAGEMENT FOR NEXT GENERATION CLUSTERS

Open Access
Author:
Zhang, Yanyong
Graduate Program:
Computer Science and Engineering
Degree:
Doctor of Philosophy
Document Type:
Dissertation
Date of Defense:
June 03, 2002
Committee Members:
  • Anand Sivasubramaniam, Committee Chair
  • Dr Hubertus Frank, Committee Member
  • Guohong Cao, Committee Member
  • Dr Natarajan Gautam, Committee Member
  • Mary Jane Irwin, Committee Member
Keywords:
  • resource management
  • operating systems
  • parallel systems
  • performance evaluation
  • CPU scheduling
  • cluster
Abstract:
Clusters of workstations built with off-the-shelf hardware, are playing an important role in mainstream high performance computing. A wide range of applications are running on clusters, ranging from traditional scientific applications, to more recent commercial ones like databases, web services and multimedia. All these applications pose tremendous processing and/or storage demands on the underlying system, making resource management an important issue for cluster design and deployment. At the same time, many of the environments where clusters are deployed, need to accommodate several such applications/users at the same time. Efficient scheduling and resource management is essential to make clusters more suitable for next generation applications. This thesis makes three main contributions to cluster scheduling and resource management. First, we develop a scheduling mechanism to boost system utilization to as high as 95 \% for workloads running at supercomputing centers (primarily scientific applications). Next, we present a novel suite of light-weight cluster scheduling mechanisms for providing better response times to interactive workloads, which are better suited to commercial applications (databases and e-commerce) and web services. Here, we have developed a set of simulation, analytical models, and implementation tools which can be easily used to study different scheduling schemes, some of which can be even used in self-healing systems. The third contribution is in scheduling support for applications having different Quality-of-Service (QoS) demands on the underlying system.