New Statistical Tools for Independence and Conditional Independence

Open Access
- Author:
- Cai, Zhanrui
- Graduate Program:
- Statistics
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- May 03, 2021
- Committee Members:
- Bharath Kumar Sriperumbudur, Major Field Member
Yanyuan Ma, Major Field Member
Bing Li, Major Field Member
Ming Wang, Outside Unit & Field Member
Runze Li, Chair & Dissertation Advisor
Ephraim Mont Hanks, Program Head/Chair - Keywords:
- Online leaerning
dimension reduction
test of independence
test of conditional independence
causal discovery - Abstract:
- Statistical independence and conditional independence are fundamental topics in statistics, and have wide applications in many other fields as well. Given two random variables, it is of interest to test whether the two variables are independent with each other. And it is also important to test whether the independence holds when other random variables are given, namely, the conditional independence. Those concepts are also closely connected with sufficient dimension reduction, where the goal is to find the linear combinations of predictors, such that given those linear predictors, the original predictors become independent with the response. Next, we illustrate a brief abstract for the three projects included in the thesis. In the first project, we perform online sufficient dimension reduction for streaming data. Specifically, we adapt the stationary sliced inverse regression to cope with the rapidly changing environments. We propose to implement sliced inverse regression in an online fashion. This online learner consists of two steps. In the first step we construct an online estimate for the kernel matrix; in the second step we propose two online algorithms, one is motivated by the perturbation method and the other is originated from the gradient descent optimization, to perform online singular value decomposition. The theoretical properties of this online learner are established. We demonstrate the numerical performance of this online learner through simulations and real world applications. All numerical studies confirm that this online learner performs as well as the batch learner. In the second project, we propose a nonparametric independence test based on mutual information. Distinguished from the existing works, we estimate the mutual information in a conditional density form, whose dimension could be reduced to 1 with new projection methods. The optimal projection direction is estimated by maximizing a penalized mutual information. Based on the optimal projection, we construct an independence test via the projected mutual information, which is insensitive to the dimensions of random vectors. The test is consistent against fixed alternatives, and can detect local alternatives at a fast rate as if the variables were univariate. Numerical results indicate that the test is more powerful compared with other existing independence tests, especially when the sample size is small or the dimension is large. In the third project, we propose a distribution-free conditional independence test. We first establish an equivalence between the conditional independence and the mutual independence. Based on the equivalence, we propose an index to measure the conditional dependence by quantifying the mutual dependence among the transformed variables. The proposed index has several appealing properties. (a) It is distribution free since the limiting null distribution of the proposed index does not depend on the population distributions of the data. Hence the critical values can be tabulated by simulations. (b) The proposed index ranges from zero to one, and power under the alternative hypothesis. (c) It is robust to outliers and heavy-tailed data since it is invariant to conditional strictly monotone transformations. (d) It has low computational cost since it incorporates a simple closed-form expression and can in the calculation of the proposed index. (f) The new index is applicable for multivariate random vectors as well as for discrete data. All these properties enable us to use the new index as statistical inference tools for various data. The effectiveness of the method is illustrated through extensive simulations and a real application on causal discovery.