Topics in High-dimensional Statistical Inference
Open Access
- Author:
- Li, Changcheng
- Graduate Program:
- Statistics
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- April 23, 2019
- Committee Members:
- Runze Li, Dissertation Advisor/Co-Advisor
Runze Li, Committee Chair/Co-Chair
Bing Li, Committee Member
Xingyuan Fang, Committee Member
Tao Yao, Outside Member - Keywords:
- Hotelling T2 test
Multiple sample mean test
Projection test
Two-sample mean test
Causal structural learning
Causal discovery
Causal graphical models
D-separation set
Generalized Method of Moments - Abstract:
- Statistical inference is the process of drawing conclusions about populations from data. It is a critical step in transforming data into knowledge. In this dissertation, we propose new methods for two topics in high-dimensional statistical inference: hypothesis testing and causal structural learning. In the first part of the dissertation, we propose a new projection test for linear hypotheses on regression coefficient matrices in linear models with high dimensional responses. We systematically study the theoretical properties of the proposed test. We first derive the optimal projection matrix for any given projection dimension to achieve the best power and provide an upper bound for the optimal dimension of the projection matrix. We further provide insights into how to construct the optimal projection matrix. One- and two-sample mean problems can be formulated as special cases of linear hypotheses on regression coefficient matrices in linear models. We both theoretically and empirically demonstrate that the proposed test can outperform the existing ones for one- and two-sample mean problems. We conduct Monte Carlo simulations to examine the finite sample performance and illustrate the proposed test by a real data example. In the second part of the dissertation, we propose a novel constraint-based causal structural learning algorithm for high-dimensional Gaussian linear causal graphical models. Existing constraint-based approaches like the PC algorithm remove edges between vertices by carrying conditional independence tests on all possible candidates of d-separation sets, which can be computationally expensive and have exponential worst-case complexity. To tackle these issues, we propose a regularized approach called Focused Generalized Method of Moments (FGMM) to identify d-separation sets between vertices. Regularized approaches such as Lasso and SCAD are widely used in feature selection, which can be used to identify Markov blankets in causal graphical models. However, Markov blankets contain spouses besides real neighbors, which also need to be removed by searching d-separation sets. Distinguished from existing regularized approaches, the FGMM approach utilizes the moment conditions to identify d-separation sets directly. Furthermore, we propose an iterative linear approximation algorithm to solve the optimization problem in the FGMM approach efficiently. We further propose skeleton and structural learning algorithms based on the FGMM method. We prove the consistency of the FGMM algorithm in high-dimensional settings. Advantages of the proposed FGMM algorithm in accuracy and efficiency are further confirmed by Monte Carlo simulation studies conducted on various benchmark networks.