Dimension Reduction and Sufficient Graphical Models

Open Access
- Author:
- Kim, Kyongwon
- Graduate Program:
- Statistics
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- May 12, 2020
- Committee Members:
- Bing Li, Dissertation Advisor/Co-Advisor
Bing Li, Committee Chair/Co-Chair
David Russell Hunter, Committee Member
Bharath Kumar Sriperumbudur, Committee Member
Rongling Wu, Outside Member
Ephraim Mont Hanks, Program Head/Chair - Keywords:
- Sufficient Dimension Reduction
Post Dimension Reduction Statistical Inference
Graphical Models
Reproducing Kernel Hilbert Space - Abstract:
- The methods I develop in my thesis are based on linear or nonlinear sufficient dimension reduction. The basic principle of linear sufficient dimension reduction is to extract a small number of linear combinations of predictor variables, which can represent original predictor variables without loss of information on the conditional distribution of response variable given predictor variables. Nonlinear sufficient dimension reduction is a more generalized version of linear sufficient dimension reduction to the nonlinear context. I am focusing on applying sufficient dimension reduction methods into two areas, regression modeling and graphical models. The first project is about statistical inference in regression context after sufficient dimension reduction. Second, I apply nonlinear sufficient dimension reduction method to the well known statistical graphical models in machine learning. These projects have consistency in a context that discovering areas that sufficient dimension reduction can be applied and establishing statistical theory behind their applications. My first project is about post sufficient dimension reduction statistical inference. The methodologies of sufficient dimension reduction have undergone extensive developments in the past three decades. However, there has been a lack of systematic and rigorous development of post dimension reduction inference, which has seriously hindered its applications. The current common practice is to treat the estimated sufficient predictors as the true predictors and use them as the starting point of the downstream statistical inference. However, this naive inference approach would grossly overestimate the confidence level of an interval, or the power of a test, leading to the distorted results. In this project, we develop a general and comprehensive framework of post dimension reduction inference, which can accommodate any dimension reduction method and model building method, as long as their corresponding influence functions are available. Within this general framework, we derive the influence functions and present the explicit post reduction formulas for the combinations of numerous dimension reduction and model building methods. We then develop post reduction inference methods for both confidence interval and hypothesis testing. We investigate the finite-sample performance of our procedures by simulations and a real data analysis. My second project is about applying nonlinear dimension reduction technique to graphical models. We introduce the Sufficient Graphical Model by applying the recently developed nonlinear sufficient dimension reduction techniques to the evaluation of conditional independence. Graphical model is nonparametric in nature, as it does not make distributional assumptions such as the Gaussian or copula Gaussian assumptions. However, unlike fully nonparametric graphical model, which relies on the high-dimensional kernel to characterize a conditional independence, our graphical model is based on a conditional independence given a set of sufficient predictors with a substantially reduced dimension. In this way, we avoid the curse of dimensionality that comes with a high-dimensional kernel. We develop the population-level properties, convergence rate, and consistency of our estimate. By simulation comparisons and an analysis of the DREAM 4 Challenge data set, we demonstrate that our method outperforms the existing methods when the Gaussian or copula Gaussian assumptions are violated, and its performance remains excellent in the high-dimensional setting.