An EM Based Tagging SNP Selection Algorithm Incorporating Genotyping Errors

Open Access
Yang, Tao
Graduate Program:
Master of Science
Document Type:
Master Thesis
Date of Defense:
Committee Members:
  • Vernon Michael Chinchilli, Thesis Advisor
  • tagging SNP selection
  • EM algorithm
  • genotyping errors
Many tagging SNP selection methods depend heavily on the estimated haplotype frequencies. One limitation of the existing tagging SNP selection algorithms is that they assume the reported genotypes are error-free. However, genotyping errors are often unavoidable in practice. Recent studies have demonstrated that even slight genotyping errors can lead to serious consequences with regard to haplotype reconstruction and frequency estimation. In this thesis, we present a tagging SNP selection method that allows for genotyping errors. Our method is based on the pair-wise r2 tagging SNP selection algorithm proposed by Carlson et al. We modified the standard EM algorithm in Carlson’s method to incorporate genotyping errors, in an attempt to obtain better estimates of the haplotype frequencies and r2 measure. Through extensive simulation studies we compared the performance of our algorithm with that of the original algorithm. We found that the number of tags selected by both methods increased with increasing genotyping errors, though our method led to smaller increase. The power of haplotype association tests using the selected tags decreased dramatically with increasing genotyping errors. The power of single marker tests also decreased, but the reduction was not as much as the reduction in power of haplotype tests. When restricting the mean number of tags selected by both methods to be similar to the baseline number, Carlson’s method and our method led to similar power for the subsequent haplotype and single marker tests. Our results showed that, by incorporating random genotyping errors, our method can select tagging SNPs more efficiently than Carlson’s method. The computer program that implements our tagging SNP selection algorithm is available at our web site: