# SAVE: A Scalable Archival and Visualization Environment for Large-Scale Scientific Computing Applications

Open Access

- Graduate Program:
- Computer Science and Engineering
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- September 03, 2004
- Committee Members:
- Paul E Plassman, Committee Chair
- Daniel Connell Haworth, Committee Member
- Padma Raghavan, Committee Member
- Hongyuna Zha, Committee Member
- Raj Acharya, Committee Member

- Keywords:
- scientific computing
- large data set
- visualization
- computational steering.

- Abstract:
- Large-scale computer simulations are playing an increasingly important role in many areas of science and engineering. A central problem for these simulations is that when run on parallel computers the tremendous amount of data generated is often impossible to archive or easily analyze. Because of the way these simulations are written and run, it is often difficult for application scientists and engineers to make specific queries of the simulation results without importing a monolithic file of the simulation's state. Among additional problems are the difficulties in comparing multiple simulation results from different solvers, monitoring the current running status of the application, or modifying simulation parameters without restarting the program. The aim of this thesis is to develop the required algorithms and software tools to address these problems. The Scalable Archiving and Visualizing Environment (SAVE) is a software system that integrates an efficient, fast data archiving scheme and a computational steering environment. The system is targeted for the on-line archiving, visualization, monitoring, and steering of large-scale scientific simulations running on parallel computing clusters such as Beowulf systems. The fast archiving scheme is based on a functional representation of the numerical simulation results. In this method, the simulation data is approximated by simple mathematical models, such as multi-variable polynomials, which once computed are easy to archive, visualize, and query. Both $L_2$- and $L_{infty}$-norm error tolerances can be used in the data approximation. The $L_2$-norm, or least squares solution is relatively straightforward to compute. However, the $L_{infty}$-norm, or minimax solution, is preferred in many applications because of its more evenly distributed approximation error. A new, efficient algorithm to approximate high-dimensional discrete data based on structured meshes in either least squares or minimax sense is developed. Significant savings in computation and memory can be achieved by taking advantage of the special Kronecker product structure of the coefficient matrix in approximation. To extend the method to unstructured meshes based applications, the approximation is performed using a sampled training set and its verification on an independent testing set. For examples from a turbulent combustion application, we demonstrate that large compression ratios can be obtained for multi-dimensional data with the functional representation approach for typical error tolerances. The compression ratio increases with approximation tolerance and order of polynomials. However, results show that a polynomial of high order does not always lead to better compression ratios. The use of such polynomials can potentially generate ill-conditioned coefficient matrices, which makes these problems difficult to solve accurately. A new efficient parallel iso-surface visualization algorithm is presented to take advantage of the special data structure of functional representation method. This method is different from marching cubes in that the iso-surface is computed through solving a set of Ordinary Differential Equations (ODEs) rather than interpolation of the grid elements. Oriented glyphs are attached to solution points to visualize the iso-surface. The glyphs' normal directions are efficiently computed from the underlying polynomials. Good speedups are obtained for the algorithm on Beowulf computing clusters. These efficiencies make this method especially useful for large-scale scientific applications. To enhance the usability of SAVE, a cross-platforms Graphics User Interface (GUI) is developed to present information of parallel simulations ``on-the-fly," accept user input to steer its running behavior, make on-line/off-line query of simulation results, achieve on-line visualization, and automate animations. A client-server based manager is developed for easy management of parallel simulations running on Beowulf computing clusters. SAVE has been tested on a sophisticated Computational Fluid Dynamics (CFD) application NTMIX3D---a high-order solver for reacting turbulent flow that can run in configurations of Direct Numerical Simulation (DNS) and Large Eddy Simulation (LES). Results demonstrate the utility of SAVE as an efficient software tool for large-scale simulations on high-performance and parallel systems.