Statistical Learning for Robust Topological Inference

Restricted (Penn State Only)
- Author:
- Vishwanath, Siddharth
- Graduate Program:
- Statistics
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- June 24, 2023
- Committee Members:
- Bharath Kumar Sriperumbudur, Chair & Dissertation Advisor
Alexei Novikov, Outside Unit & Field Member
Kenji Fukumizu, Special Member
Aleksandra Slavkovic, Major Field Member
Satoshi Kuriki, Special Member
Nicole Lazar, Major Field Member
Bing Li, Professor in Charge/Director of Graduate Studies - Keywords:
- Topological data analysis
Topological inference
Statistical learning
Robust statistics
Generative models
Learning on graphs
Differential privacy - Abstract:
- Recent advances in computational topology have given rise to the burgeoning field of Topological Data Analysis (TDA), which provides a powerful framework for extracting geometric and topological features of data. However, despite its potential, the adoption of TDA in mainstream statistical methodology remains limited, and, more importantly, the lack of statistical rigor in the (usually) heuristic TDA routines has only been recognized in recent years. This dissertation seeks to bridge the gap between TDA and mainstream statistical methodology, in addition to developing new frameworks for statistical learning and inference from complex, multimodal data. To address this challenge, we focus on four different perspectives: 1. We develop a framework for constructing robust persistence diagrams, which form the backbone of most TDA routines. We develop analogues of classical tools from robust statistics to analyze the impact of outliers on the resulting persistence diagrams, and provide refined statistical analyses for their convergence rate in the bottleneck metric. Furthermore, we describe a data-driven procedure for adaptively selecting the tuning parameters, and provide theoretical guarantees for the resulting inference. 2. We investigate statistical inference from graphs under differential privacy---a framework for releasing information while preserving privacy. We use tools from TDA to describe the structure underlying the latent positions of probabilistic graphical models and investigate the impact of the privacy mechanism on resulting statistical inference. We highlight the benefit of the topological perspective through several applications. 3. We study the scope and limitations of using topological summaries in place of classical statistical summaries. We characterize a condition called β-equivalence (“Betti-equivalence”), under which the statistical behavior of topological summaries are asymptotically indistinguishable for an entire class of distributions. To this end, we investigate necessary and sufficient conditions under which topological inference is possible in this setting. 4. Lastly, we use tools from TDA to enable more reliable learning of probability measures in generative models. We demonstrate that persistence diagrams provide a compact and informative representation of the underlying structure and can guide the optimization process by preserving essential topological features, leading to stable convergence towards the target measure, and more reliable alignment of the probability measure in complex domains.