Kernel Methods for Neuroimaging Genomewide Association Studies

Open Access
Hua, Wen-yu
Graduate Program:
Doctor of Philosophy
Document Type:
Date of Defense:
May 27, 2014
Committee Members:
  • Debashis Ghosh, Dissertation Advisor
  • Debashis Ghosh, Committee Chair
  • Yu Zhang, Committee Member
  • Qunhua Li, Committee Member
  • Le Bao, Committee Member
  • Daniel Kifer, Special Member
  • Kernel methods
  • GWAS
  • Neuroimaging analysis; Multiple comparison procedures.
Measuring high-dimensional dependence is a difficult and important problem in the fields of statistics and machine learning, and is often being used for applications in biostatistics. In this dissertation, the effect of kernel methods are studied for high dimensional measure of dependence, and the methods are applied to neuroimaging data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) study. The ADNI study is a comprehensive dataset with the goal of finding potential genetic variants that contribute to the difference in brain volumes among the subjects, it contains brain magnetic resonance image (MRI) scans of 741 enrolled subjects, and are the main neuroimaging data used for this work. The first part of this dissertation utilizes distance covariance for the association analysis on the ADNI dataset, as distance covariance is a consistent test and is able to identify linear or non-linear associations, and also preserve the within variable interactions using the $L_2$ distance kernel. Furthermore, combining distance covariance with a novel FDR modeling algorithm is proposed for examining a larger number of genetic variants simultaneously and it results in an increase number of significant findings than what has been seen previously in the literature. The association analysis is then extended to the cases when covariates are present, where distance covariance is discussed in Hilbert spaces that results in a new representation called the kernel distance covariance (KDC). As part of our explorations under Hilbert spaces, we find and establish the equivalence of KDC and kernel machine regression (KMR) under certain conditions, and this leads to two major contributions: 1) the equivalence allows to measure the associations between two variables while taking into accounts of the covariate effects; and 2) KMR is found to be a member of KDC, where the family members are the different input kernels. This framework that unifies KDC and KMR leads to the problem of finding the optimal kernel pair among the KDC family, and is the concluding part of this work. For finding the optimal kernel pair, we emphasize the discussion under the situations that there be no covariate effects, under which the KDC is related to the Hilbert-Schmidt Independence Criterion (HSIC). A local-alternative Power Maximization algorithm (LaPM) is proposed for the search of kernel pair optimality at a given test level, where LaPM selects the best kernel pair which maximizes the test power against local alternative (the Pitman approach). The numerical analysis shows that LaPM is able to achieve higher power than other existing approaches, and our real data experiment using the ADNI study shows that the optimal kernel pair selected by the proposed method can be used for the interpretation of high dimensional data.