Statistical Network Modeling and its Applications in Complex Large-Scale Systems

Open Access
- Author:
- Agarwal, Amal
- Graduate Program:
- Statistics
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- June 08, 2020
- Committee Members:
- Lingzhou Xue, Dissertation Advisor/Co-Advisor
Lingzhou Xue, Committee Chair/Co-Chair
David Russell Hunter, Committee Member
Dennis Kon-Jin Lin, Committee Member
Xiang Zhan, Outside Member
Aleksandra B Slavkovic, Committee Member
Ephraim Mont Hanks, Program Head/Chair - Keywords:
- Clustering
Environmental studies
Exponential-family random graphical model
Variational Inference
Dynamic Network
Weighted Network
Pollution Detection - Abstract:
- Model-based clustering of networks has been a major research topic in large scale network analysis. The network relational data is represented in different forms such as dynamic networks, weighted networks, bipartite networks etc. Existing research encompasses only a handful of modeling frameworks to handle such data and that too with several restrictions. As the network size grows, it becomes even harder to model such complex relationships. Furthermore, there are several challenges to derive useful insights from stream networks in environmental sciences and geoscientific research. It is therefore important to develop effective and efficient statistical methodologies to analyze large-scale dynamic and weighted networks. In this dissertation, we first propose a scalable time-evolving community detection framework through dynamic exponential-family random graph models (ERGMs) based on hidden Markov models. We show its application to international trade and email networks. In the second project, we develop a principled nonparametric weighted network model based on ERGMs and local likelihood estimation. This model has been motivated by the need to detect pollution in river stream networks. We show its application to large-scale water pollution analysis in Pennsylvania, USA. In the third project we develop a validation framework, GeoNet, for the nonparametric weighted network model. This geospatial-analysis tool is capable of detecting statistically significant changes between background and potentially-impacted sites locally. Finally, we describe the computing tools implementing all above methods as part of two R packages `netclust' and `GeoNet'.