Communication-Driven Coscheduling in Clusters

Open Access
- Author:
- Choi, Gyu Sang
- Graduate Program:
- Computer Science and Engineering
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- October 20, 2005
- Committee Members:
- Chitaranjan Das, Committee Chair/Co-Chair
Daniel Connell Haworth, Committee Member
Padma Raghavan, Committee Member
Guohong Cao, Committee Member
Andy B Yoo, Committee Member - Keywords:
- Clusters
Scheduling - Abstract:
- The availability of high bandwidth and low-latency networks, supported by efficient user-level communication protocols, has made High Performance Computing (HPC) clusters an attractive and cost effective alternative to traditional multiprocessor systems. On any such parallel computing platform, optimally scheduling processes of a parallel job onto various nodes of the system has always been a challenging problem. The proposed scheduling techniques in clusters span from simple batch scheduling and native local scheduling to more sophisticated techniques like gang scheduling or communication-driven coscheduling. Unlike batch and gang schedulings, all prior studies on communication-driven coscheduling techniques were conducted on non-dedicated clusters. Thus, the overall objective of this thesis is to design a new coscheduling technique that is scalable, simple and portable, and should provide better performance than other scheduling techniques in a dedicated cluster. In this context, we have investigated three related issues in this thesis. First, we propose a generic framework to provide a unified interface for implementing any scheduling technique on a cluster. Since this requires minimal modification to Network Interface Card (NIC) firmware, NIC device driver and communication library, we have implemented three prior and two proposed communication-driven coschedulings based on this framework on a Linux cluster, thus proving the framework's ease of implementation as well as its portability. Second, we propose a new scheduling technique, called Co-ordinated Coscheduling (CC), which accounts spinning time optimizations at both the sender and receiver ends, and thus minimizes wasted spin-times. The results show that blocking-based coschedulings yield an approximately 50\% less execution time compared to spin-based coschedulings. Moreover, we propose the Hybrid coscheduling (HYBRID) scheme, which combines the intrinsic merits of both gang scheduling and communication-driven coscheduling. We compare the performance and energy consumption of communication-driven coscheduling schemes with batch and gang scheduling techniques. Hybrid Coscheduling generates completion time that are up to 30\% and 100\% less than Gang scheduling and Batch scheduling, respectively, across various workloads. In addition, we investigate the impact of memory swapping on the performance of scheduling techniques. With a memory-aware allocation, the Hybrid coscheduling shows better performance than batch and gang schedulings. This observation indicates that memory swapping is a major problem in employing communication-driven coscheduling. Finally, we exploit a NIC caching scheme for cluster-based Web servers, in order to reduce the DMA latency and PCI traffic. We have implemented the proposed NIC caching scheme on an 8-node Myrinet-connected Linux cluster and conducted performance comparison with the cluster-based Web server (i.e. PRESS). The results shows that the proposed NIC caching scheme yields a 27\% throughput improvement compared to the PRESS model. Traditionally, batch and gang schedulings have been widely used in most of clusters, while communication-driven coschedulings have only been explored with simulation-based studies or implementations on small-scale clusters before this study. This is because communication-driven coschedulings have originally been proposed for non-dedicated clusters, and thus, have not been adapted in real employments. In this thesis, all these results suggest that a blocking-based coscheduling technique can be a viable candidate to be used in clusters for significant performance-energy benefits, compared to batch and gang schedulings. Thus, the main contribution of this thesis is that it shows the advantages of deploying the communication-driven coschedulings in commercial systems.