Inference Methods for High-Dimensional Data: A Focus On Different Hypothesis Testing Problems
Restricted (Penn State Only)
- Author:
- Zhang, Zhe
- Graduate Program:
- Statistics
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- April 24, 2023
- Committee Members:
- Murali Haran, Program Head/Chair
Bing Li, Major Field Member
Zhibiao Zhao, Major Field Member
Ming Wang, Outside Unit & Field Member
Runze Li, Chair & Dissertation Advisor - Keywords:
- statistical inference
high dimension
marginal coordinate hypothesis
sufficient dimension reduction
partial linear model
power enhancement
linear hypothesis - Abstract:
- This dissertation aims to develop new statistical inference procedure for high-dimensional regression models, and focuses on three fundamental problems: (a) individual hypothesis testing without specification of high-dimensional regression models, (b) high dimensional linear hypothesis testing in linear regression model and (c) individual hypothesis testing in partial linear model . In Chapter 3, we propose an effective model-free inference procedure for high-dimensional regression models. We first reformulate the hypothesis testing problem via sufficient dimension reduction framework. With the aid of new reformulation, we propose a new test statistic and show that its asymptotic distribution is $\chi^2$ distribution whose degree of freedom does not depend on the unknown population distribution. We further conduct power analysis under local alternative hypotheses. In addition, we study how to control the false discovery rate of the proposed chi-squared tests, which are correlated, to identify important predictors under a model-free framework. To this end, we propose a multiple testing procedure and establish its theoretical guarantees. Monte Carlo simulation studies are conducted to assess the performance of the proposed tests and an empirical analysis of a real-world data set is used to illustrate the proposed methodology. In Chapter 4, we present a novel transformation-based inference method for conducting linear hypothesis tests in high-dimensional linear regression models. Our method uses score functions to construct a new random vector and links high-dimensional coefficient tests to high-dimensional one sample mean tests. We provide a formulation for a U-statistic with a kernel of order two and demonstrate its asymptotic normality. The presence of high-dimensional nuisance parameters presents a significant challenge in our model setting, however, we have shown that their impact can be disregarded asymptotically under mild conditions. Additionally, we have studied the influence of the power enhancement term on power performance through both theoretical analysis and simulations. The results indicate that the enhancement term does not impact the type-I error rate and can improve power performance in scenarios where the U-statistic may not perform well. In Chapter 5, we consider testing the treatment effect in high-dimensional partial linear models. Due to the slow convergence rate of the unknown nuisance function estimator from some machine learning algorithms, we can not directly estimate and plug in the nuisance function on the same data. To overcome this limitation, we update the estimation of the nuisance function recursively. This leads to an explicit expression of the estimators of the parameters of interest. Our approach has been shown to have asymptotic normality, and we assess its finite sample performance through simulations. The results indicate that our statistic offers higher power than in cases of model misspecification.