New Statistical Tools for High-dimensional Data Modeling
Open Access
- Author:
- Liu, Wanjun
- Graduate Program:
- Statistics
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- April 23, 2019
- Committee Members:
- Runze Li, Dissertation Advisor/Co-Advisor
Runze Li, Committee Chair/Co-Chair
Lingzhou Xue, Committee Member
Xingyuan Fang, Committee Member
Tao Yao, Outside Member - Keywords:
- High-dimensional data
Hypothesis testing
Feature screening
Regularized method - Abstract:
- This dissertation consists of two parts. In the first part, we focus on the estimation of linear functional and its application to projection test for high-dimensional mean vector. We first study a general regularized quadratic programming with non-convex penalty and linear constraint. Deterministic error bounds are established for any stationary point that satisfies the necessary first order condition. We also propose an ADMM algorithm with local linear approximation to solve such the non-convex regularized quadratic programming. In particular, we study a special case of the regularized quadratic programming: estimation of linear functional. Furthermore, we apply the linear functional to perform projection test for high-dimensional data. Two projection tests are proposed. The first one is a projection test based on a data-splitting strategy, which achieves an exact $t$-test under normality assumption. The second one is a projection test based on an online framework, which updates the estimation of optimal projection direction when new observations arrive. This online projection test improves the power the data splitting approach. We derive the asymptotic normal distributions under both null and alternative for the online projection test. We conduct numerical studies to compare the finite sample performance of our proposed projection tests with several existing tests. The numerical results show that the proposed projection tests can keep the type I error rate well and are much more powerful than other existing tests. In the second part, we focus on the model free feature screening for high-dimensional data via projection correlation. The idea of feature screening is to deliver a computationally efficient way to reduce the dimensionality of the feature space from a very high scale to a moderate one while retaining all the important features. The proposed method is based on ranking the projection correlations between features and response variable. This screening procedure does not require specifying any regression model and requires no moment conditions on both features and response variable. The theoretical analysis demonstrates the proposed method enjoys not only the sure screening property but also a stronger result called rank consistency property. The extensive simulated experiments show the proposed method wins the horse racing against its competitors on various scenarios.