Understanding Hydrologic Functions of Water Storage and Soil Moisture and Improving Predictive Capability using Big Data Machine Learning Methods

Open Access
Fang, Kuai
Graduate Program:
Civil Engineering
Doctor of Philosophy
Document Type:
Date of Defense:
September 07, 2018
Committee Members:
  • Chaopeng Shen, Dissertation Advisor
  • Chaopeng Shen, Committee Chair
  • Xiaofeng Liu, Committee Member
  • Klaus Keller, Committee Member
  • Daniel Kifer, Outside Member
  • Hydrology
  • Watershed Functioning
  • Water Storage
  • Soil Moisture
  • Machine Learning
  • Deep Learning
Water storage and soil moisture are important in the water cycle, not only for environmental and social application but also as state variables modulating hydrologic fluxes. The latest advance in remote sensing, e.g., Gravity Recovery and Climate Experiment (GRACE) and Soil Moisture Active Passive (SMAP), provided observations of subsurface water content unseen before. Big data machine learning (BDML) is a powerful approach to extract patterns and search the linkages between data. However, little attention was paid in using BDML methods to interpret and predict the dynamics of water content upon those latest satellite products. In addition, the recent breakthrough in BDML, known as Deep Learning (DL), strongly improved the modeling accuracy and data efficiency compared to earlier machine learning methods. While DL has achieved unprecedented success in various disciplines, its potential in water science has not been fully recognized. Here we first introduce hydrologic signatures extracted from GRACE which helps improving water partitioning estimation based on the Budyko hypothesis. Then we examine the relationship between GRACE storage and basin runoff by proposing a storage-streamflow correlation spectrum (SSCS). We ask, (i) "what are the SSCS patterns that exist over CONUS"; (ii) "what factors are controlling such patterns"? We find that SSCS patterns present important clues about hydrologic processes and geologic characteristics, e.g., storage on the Appalachian Plateau are limited by thin soils, compacted soils in northern Ohio lead to shallow water table that limits storage, and streamflow on northern Great Plains and Southeast Atlantic regions are dominated by groundwater. By interpreting SSCS patterns with classification and regression trees (CARTs), we can inspire, corroborate, or reject hypotheses about functional basin behaviors. Nevertheless, CARTs has very low data efficiency. The number of data points at the lower branches are exponentially less comparing to the upper ones. Thus the lower levels are highly unstable and hard to interpret. In the ensuing section, we propose a novel data-driven technique, time series deep learning, to capture how surface soil moisture dynamics respond to atmospheric inputs, human interventions, and subsurface feedbacks. We show the effectiveness and fidelity of the form of DL called Long Short-Term Memory (LSTM) for spatiotemporally prolongate satellite-sensed soil moisture. This prolongated dataset could be used to related past extreme events to soil moisture dynamics. Furthermore, with the help of LSTM, we extended SMAP product years beyond its lifespan which achieves similar performance as the training period. A fused product combining this new long-term LSTM projection with model simulation is found to outperforms either of its components or combinations of different models, which reveals the potential of DL in estimating long-term dynamics as well as integrating model and data.