Projection Test for High-dimensional Mean Vectors with Optimal Direction
Open Access
- Author:
- Huang, Yuan
- Graduate Program:
- Statistics
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- June 11, 2015
- Committee Members:
- Runze Li, Dissertation Advisor/Co-Advisor
Runze Li, Committee Chair/Co-Chair
David Russell Hunter, Committee Member
Naomi S Altman, Committee Member
Lan Kong, Committee Member
Aleksandra B Slavkovic, Committee Member - Keywords:
- High-dimensional data
Hotelling's T2 test
Projection test
One-sample problem
Two-sample problem - Abstract:
- Testing the population mean is fundamental in statistical inference. When the dimensionality of a population is high, traditional Hotelling's $T^2$ test becomes practically infeasible due to the singularity of sample covariance matrix. In this dissertation, we propose a projection-based testing method for the high-dimensional one-sample and two-sample mean problems. Our method projects the original sample to a lower-dimensional space and conducts tests on the projected sample. Different from existing projection-based tests, our approach is based on data-driven estimation of the optimal direction. Meanwhile, our test keeps the equivalence of null hypotheses between the original sample and the projected sample, which is often ignored by previous researches. We show that the test based on projected sample is an exact $t$-test under the normality assumption and an asymptotic $\chi^2$-test with one degree of freedom without the normality assumption. In the one-sample problem, we are interested in testing $H_0:\bmu=\bmu_0$ against $H_1:\bmu \neq \bmu_0$ for a random sample of size $N$ from a $p$-dimensional population $X$ with finite mean vector $\bmu$ and finite positive definite covariance matrix $\Sigma$. We derive the theoretical optimal direction with which the test possesses the most power under the alternative. We show that projection to a one-dimensional space with direction $\Sigma^{-1}(\bmu-\bmu_0)$ leads to the optimal power, regardless of the distribution assumption. The null hypothesis with the projected sample is $ (\bmu-\bmu_0)^T\Sigma^{-1}(\bmu-\bmu_0)=0$, which holds if and only if $\bmu=\bmu_0$ for a full rank $\Sigma$. A computationally efficient algorithm is developed to implement the new test. Local asymptotic property is studied and we show that under mild conditions the proposed test outperforms the major existing methods. Our numerical comparison shows that the new test retains Type I error rate well and can be more powerful than the existing tests for the high-dimensional data. In the two-sample problem, we are interested in testing $H_0:\bmu_1=\bmu_2$ against $H_0:\bmu_1 \neq \bmu_2$ for two independent random samples of size $N_i$ from populations with finite mean vector $\bmu_i$ and finite positive definite covariance matrix $\Sigma_i$, $i=1,2$ respectively. When $\Sigma_1=\Sigma_2=\Sigma$, we prove that the optimal direction is $\Sigma^{-1} (\bmu_1-\bmu_2)$, regardless of the distribution assumption. When the covariance matrices are unequal, we show that the optimal projection direction is $\left( \Sigma_1 + \frac{N_1}{N_2} \Sigma_2 \right)^{-1} (\bmu_1-\bmu_2)$ for normal population by first taking Bennett's transformation to obtain an one-sample sequence of size $N_1$ that is distributed from $N( \bmu_1 - \bmu_2, \Sigma_1 + \frac{N_1}{N_2} \Sigma_2)$, assuming $N_1 <N_2$. Both theoretically and empirically, we demonstrate that the proposed test can be much more powerful than the existing ones.