Statistical Learning with Conditional Independence and Graphical Models
Restricted (Penn State Only)
- Author:
- Sheng, Tianhong
- Graduate Program:
- Statistics
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- June 07, 2022
- Committee Members:
- Ephraim Mont Hanks, Professor in Charge/Director of Graduate Studies
David Hunter, Major Field Member
Bharath Kumar Sriperumbudur, Co-Chair & Dissertation Advisor
Rongling Wu, Outside Unit & Field Member
Bing Li, Co-Chair & Dissertation Advisor - Keywords:
- Conditional Independence
Graphical Models
Reproducing Kernel Hilbert Spaces
Cross-validation - Abstract:
- Conditional independence is one of the most fundamental mechanisms underlying many statistical methods, such as probabilistic graphical models, causal discovery, feature selection, dimensionality reduction, and Bayesian network learning. The three topics in this thesis are all related to exploring, testing, and validating conditional independence. On the theoretical side, we study the properties of different types measures of conditional independence and the connections between the kernel-based measures and distance-based measures. On the methodology side, we develop a new type of probabilistic graphical models based on conditional independence under a skewed Gaussian distribution assumption. Furthermore, inspired by the tuning parameter selection scheme in the graphical models, we develop a new model selection scheme called "omnibus cross-validation" which is a general cross-validation scheme that can be applied to a wide range of variable selection problems. In Chapter 3, we explore the connection between conditional independence measures induced by distances on a metric space and reproducing kernels associated with a reproducing kernel Hilbert space (RKHS). For certain distance and kernel pairs, we show the distance-based conditional independence measures to be equivalent to that of kernel-based measures. On the other hand, we also show that some popular kernel conditional independence measures in machine learning, which are based on the Hilbert-Schmidt norm of a certain cross-conditional covariance operator, do not have a simple distance representation, except in some limited cases. This chapter shows that the distance and kernel measures of conditional independence are not quite equivalent unlike in the case of joint independence as shown by Sejdinovic et al. (2013). In Chapter 4, we introduce a skewed Gaussian graphical model as an extension to the Gaussian graphical model. One of the appealing properties of the Gaussian distribution is that conditional independence can be fully characterized by the sparseness in the precision matrix. The skewed Gaussian distribution adds a shape parameter to the Gaussian distribution to take into account possible skewness in the data; thus it is more flexible than the Gaussian model. Nevertheless, the appealing property of the Gaussian distribution is retained to a large degree: the conditional independence is still characterized by the sparseness in the parameters, which now include a shape parameter in addition to the precision matrix. As a result, the skewed Gaussian graphical model can be efficiently estimated through a penalized likelihood method just like the Gaussian graphical model. We develop an algorithm to maximize the penalized likelihood based on the alternating direction method of multipliers, and establish the asymptotic normality and variable selection consistency for the new estimator. Through simulations, we demonstrate that our method performs better than the Gaussian and Gaussian copula methods when these distributional assumptions are not satisfied. The method is applied to a breast cancer MicroRNA dataset to construct a gene network, which shows better interpretability than the Gaussian graphical model. In Chapter 5, we introduce a general cross-validation scheme, which we call "omnibus cross-validation", that applies to a wide class of estimation problems, including those with no response, no likelihood, and those which are not even of the form of an M-estimate. Traditional cross-validation usually involves prediction of a response variable: part of the data is used for model estimation, and the rest are held out for prediction, which determines the best set of parameters in variable selection, or the best tuning parameter for a variable selector. However, in many modern applications, such as the statistical graphical models, there are no response variables to predict, but only a likelihood to maximize. In other applications, there is not even a likelihood to maximize, but only a generic objective function to maximize. Although cross-validation has been applied naively to some of these situations, there is no systematic theory to back up such naive practices. We developed a systematic theory to support this new method, including Fisher consistency at population level, and asymptotic consistency at the sample level, both for best subset variable selection and for determining the tuning parameter through the solution path of existing variable selectors such as Lasso. We conduct simulation investigation of the performance of new method.