Parallel I/o Profiling and Optimization in Hpc Systems

Open Access
Kim, Seong Jo
Graduate Program:
Computer Science and Engineering
Doctor of Philosophy
Document Type:
Date of Defense:
January 24, 2014
Committee Members:
  • Mahmut Taylan Kandemir, Dissertation Advisor
  • Mahmut Taylan Kandemir, Committee Chair
  • Mary Jane Irwin, Committee Member
  • Padma Raghavan, Committee Member
  • Dinghao Wu, Committee Member
  • Rajeev Thakur, Special Member
  • MPI-IO
  • PnetCDF
  • HDF5
  • PVFS
  • I/O Software Stack
  • Code Instrumentation
  • Code Generation
Efficient execution of large-scale scientific applications requires high-performance computing systems designed to meet the I/O requirements. To achieve high-performance, such data-intensive scientific applications use multiple layers of I/O software stack that consists of high-level I/O libraries such as PnetCDF and HDF5, the MPI library, and parallel file systems. To design efficient parallel scientific applications, understanding the complicated flow of I/O operations and the involved interactions among the libraries is quintessential. Such comprehension helps identify I/O bottlenecks and thus exploits the potential performance in different layers of the storage hierarchy. To trace the execution of I/O operations and to understand the complex interactions in the I/O stack, we have designed and implemented a parallel I/O profiling and visualization framework for high-performance storage systems, IOPro. IOPro automatically generates an instrumented I/O stack, runs applications on it, and visualizes detailed statistics in terms of user-specified metrics of interest. Next, we introduce a dynamic performance visualization and analysis framework for parallel I/O, called IOPin. IOPin performs the instrumentation with minimal overhead in the binary code of the I/O stack at runtime and provides the language independent instrumentation targeting specific applications written in C/C++ and Fortran. Furthermore, it requires neither source code modification nor recompilation of the application and the I/O software stack components. Lastly, we propose an automatic parallel I/O code generation and optimization framework for HPC applications, called IOGenie. Using a graphical user interface, our tool takes high-level annotations for I/O as input, analyzes the given options, and generates optimized I/O code that effectively exercises the underlying I/O stack. Overall, this thesis proposes three frameworks, IOPro, IOPin, and IOGenie. IOPro and IOPin help understand the complex interactions across different I/O layers from applications to the underlying parallel file systems, using two different approaches: static code instrumentation and runtime binary instrumentation. IOGenie helps users write data-intensive applications easily and effectively and enhances the quality of tool- generated code that exploits various optimizations on the underlying I/O software.