Robust Techniques for High-dimensional Data: Modern Approaches and Applications
Restricted (Penn State Only)
- Author:
- Tong, Zhaoxue
- Graduate Program:
- Statistics
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- May 17, 2023
- Committee Members:
- Bing Li, Professor in Charge/Director of Graduate Studies
Yanyuan Ma, Major Field Member
Bing Li, Major Field Member
Ming Wang, Outside Unit & Field Member
Runze Li, Chair & Dissertation Advisor - Keywords:
- feature screening
false discovery rate control
robust regression
and precision matrix estimation
feature screening
false discovery rate control
robust regression
precision matrix estimation
heavy-tailed data
model-free
strong oracle property
winsorization - Abstract:
- This dissertation aims to develop statistical methods to address the challenges in modeling high-dimensional data caused by the presence of large amounts of noisy and ultra-high-dimensional data. The work focuses on fundamental theory and methodology in high-dimensional data analysis, including feature screening, false discovery rate control, robust regression, and precision matrix estimation. The first project introduces a new model-free conditional feature screening approach that is robust to outliers and heavy-tailed predictors and responses. Additionally, an FDR control procedure is proposed to enhance the performance of the screening procedure. We provide theoretical guarantees for the sure screening and false discovery control performance. We also present finite sample performance comparisons with existing methods through Monte Carlo simulation studies and a real data example. The second project proposes a new robust estimator that can handle both heavy-tailed predictors and heavy-tailed errors in high-dimensional regression. The estimator employs rank-based regression and winsorizes heavy-tailed predictors, with a focus on reducing the burden of tuning. The work establishes sufficient conditions for statistical consistency and demonstrates the strong oracle property through a second-stage enhancement. Both simulation studies and real data analysis demonstrate good performance. The third project presents a new approach for estimating the precision matrix for high-dimensional heavy-tailed data. The proposed estimator employs winsorized rank-based regression and eliminates the burden of fine-tuning, providing robustness guarantees and computational efficiency. We establish sufficient conditions for statistical consistency and propose a robust variance estimator for heavy-tailed data based on the median-of-means approach, which performs well in simulation studies.