DENSITY ESTIMATION AND MODAL BASED METHOD FOR HAPLOTYPING AND RECOMBINATION

Open Access
- Author:
- Mao, Xianyun
- Graduate Program:
- Statistics
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- October 07, 2010
- Committee Members:
- Bruce G Lindsay, Dissertation Advisor/Co-Advisor
Bruce G Lindsay, Committee Chair/Co-Chair
Naomi S Altman, Committee Member
Yu Zhang, Committee Member
Stephen Wade Schaeffer, Committee Member - Keywords:
- Expectation Maximization
Density Estimator
Genetics - Abstract:
- Genetic problems such as haplotype inference and recombination analysis are rarely studied using nonparametric models. We present here some new methods based on kernel density estimation and a modal expectation-maximization (MEM) method for analyzing genetic data. We also use a degree of freedom (DOF) calculation for bandwidth selection and diagnostics. For the problem of inferring haplotypes from genotypes, we construct a likelihood function that depends on the unknown haplotype density. We then apply a likelihood EM to a naive initial estimator to create an updated density that has higher likelihood. The density is then used to nd the most likely haplotype pairs for any genotype. The performance of the method is tested on simulated data and small sets of real data. To improve the performance of our method for large data (~1,000 individuals, 10,000 sites), we develop degrees of freedom (DOF) as a diagnostic tool to partition large data. We then use MEM to solve each partition and to merge the solutions. We show that the new method yields comparable performance to available methods both in time and in accuracy. In a similar fashion, we can dene a density estimator for binary sequences (haplotypes) in the presence of recombination and mutation. With the new density, one can estimate the probability of recombination for given sites.