TOPICS ON THEORETICAL FOUNDATIONS OF HIGH-DIMENSIONAL STATISTICS

Restricted (Penn State Only)
- Author:
- Yang, Haoyi
- Graduate Program:
- Statistics
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- July 25, 2024
- Committee Members:
- Bing Li, Professor in Charge/Director of Graduate Studies
Lingzhou Xue, Chair & Dissertation Advisor
Bing Li, Major Field Member
Runze Li, Major Field Member
Qian Chen, Outside Unit & Field Member - Keywords:
- High-dimensional statistics
Sparse principal component analysis (Sparse PCA)
Elastic-Net
Ladle statistics
Variable selection
P-value combination
Dependent tests
Asymptotic theory
Iterative thresholding
Theoretical guarantees
High-dimensional data analysis
Convergence guarantees
Statistical properties
Spiked covariance model
Bootstrap variability
Simulation studies
Real-world applications
Bootstrap - Abstract:
- This dissertation investigates multiple fundamental theories in statistics, addressing key challenges and proposing novel solutions in high-dimensional data analysis. The research is structured into four distinct yet interrelated chapters, each focusing on a specific statistical problem and its theoretical and practical implications. Chapter 2 presents theoretical guarantees and iterative thresholding techniques for sparse principal component analysis (PCA) using the Elastic-Net. Rigorous convergence guarantees and estimation error bound for the proposed algorithms are provided, and their utility in high-dimensional data is demonstrated through extensive simulation studies and real-world application. Chapter 3 introduces Ladle statistics, a new criterion for variable selection in high-dimensional models. This chapter focuses on the development and theoretical justification of Ladle statistics, demonstrating its effectiveness in identifying relevant variables in complex models through both theoretical analysis and practical applications. Chapter 4 explores the asymptotic theory for combining $p$-values of dependent tests beyond the bivariate normal condition. Traditional methods for combining $p$-values often assume independence or simple dependence structures. This chapter extends these methods to accommodate more complex dependencies and distributions, providing new insights and robust approaches for high-dimensional data analysis. Chapter 5 summarizes the key findings of the dissertation, discussing the theoretical contributions and practical implications of the research. It outlines potential directions for future work, emphasizing the need for further development in high-dimensional statistical methods and their applications. Together, these chapters contribute to a deeper understanding of high-dimensional statistical methods, offering new tools and perspectives for both theoretical research and practical data analysis.