Statistical methods for identifying novel DNA sequence variants associated with atherosclerosis
Open Access
- Author:
- Krishnan, Mera
- Graduate Program:
- Biobehavioral Health
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- September 28, 2012
- Committee Members:
- George Patrick Vogler, Committee Chair/Co-Chair
David John Vandenbergh, Committee Member
Gerald Eugene Mc Clearn, Committee Member
Alexander F Wilson, Special Member
Hobart H Cleveland Iii, Committee Member
George Patrick Vogler, Dissertation Advisor/Co-Advisor - Keywords:
- Genetics
Genetic Epidemiology
Statistical Genetics
Atherosclerosis
Cardiovascular Disease - Abstract:
- The common disease/common variant hypothesis for complex diseases postulates that variation in susceptibility to common complex diseases can be explained by common DNA sequence variants rather than rare variants. Rare variant detection can be limited by an insufficient sample size and inadequate sequencing coverage. Candidate gene sequencing in a large sample may be a direct and effective method for the discovery of rare functional variants. The purpose of this study was to determine if rare sequence variants that are potentially associated with cholesterol-related phenotypes in the ClinSeqTM study tend to cluster in functional domains and if novel analytical approaches that leverage genomic and proteomic information may be useful in association analyses. A Fisher’s Exact Test was applied to sequence variants in sixteen candidate genes to test for an association of these variants with the highest and lowest quartiles of low density lipid cholesterol, high density lipid cholesterol, and triglycerides. No variant obtained Bonferroni-level significance. Counts of rare variants (variants with a minor allele frequency < 5%) within genes found to be significantly associated with HDL-C from the Fisher’s Exact Test were compared between patients lying within the highest and lowest quartiles of HDL cholesterol. Out of six genes tested: ABCA1, ABCG8, HMGCR, INSIG2, LDLR, and CETP-- the total number of rare variants in only one gene, ABCG8, was significantly different between the HDL-C extremes (p = 0.011). The total number of rare variants within exons, introns, and transcription factor binding sites were also compared. Significant differences in the total number of rare variants within exons were found for CETP (p=0.009) with suggestive differences for ABCG8 (p=0.056) and LDLR (p=0.066). Only ABCG8 had a significant difference in sums of rare variant counts in introns between high and low HDL-C (p=0.003). Only CETP had a significant difference in sums of rare variant counts in transcription factor binding sites (p=0.043). There were no significant differences in variants with minor allele frequencies less than 1% and 0.5% between individuals with extreme phenotypes in any functional domain classification. Several recently developed methods of rare variant association analysis were compared to evaluate the relative ability of each of these methods to detect both previously known putative causal variants and novel variants associated with LDL cholesterol level, HDL cholesterol levels, and triglyceride levels. These methods, the Cohort Allelic Sums Test, Combined Multivariate and Collapsing method, the Weighted-Sum Method, and the C-Alpha method were applied to functional domains in a subset of candidate genes. The functional domain classifications considered were: exons, introns, transcription factor binding sites, DNase hypersensitivity sites, enhancers, silencers, and repressors. Most methods found a highly significant association of LDL-C with DNase hypersensitivity sites within LDLR. Utilizing multi-species conservation information in quantitative trait association analyses using the Variable Threshold method yielded significant associations of cholesterol traits with AP1M2, LDHB and LIPC. A novel method developed as a part of this dissertation work leveraged thermodynamic prediction output from the FoldX program to tests for associations of multiple mutations simultaneously with quantitative phenotypes showed promise in the exemplar case of CETP and HDL-C.