Detecting Footprints of Ancient Natural Selection from Population Genomic Data
Open Access
- Author:
- Cheng, Xiaoheng
- Graduate Program:
- Molecular, Cellular and Integrative Biosciences
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- August 21, 2020
- Committee Members:
- Michael DeGiorgio, Dissertation Advisor/Co-Advisor
Stephen Wade Schaeffer, Committee Chair/Co-Chair
George H Perry, Committee Member
Michael DeGiorgio, Committee Chair/Co-Chair
Le Bao, Outside Member
Melissa Rolls, Program Head/Chair
Stephen Wade Schaeffer, Dissertation Advisor/Co-Advisor - Keywords:
- Natural selection
Population genetics
Balancing selection
Maximum likelihood ratio
Adaptation
Great apes - Abstract:
- Targets of natural selection in the genome often bear intriguing tales of adaptive changes during the evolution of an organism, and the genomic footprints left by selection have therefore been extensively studied and mined for by population geneticists. Nonetheless, the unprecedented amount of population whole-genome sequencing data mushrooming in the recent decade brings new challenges to method developments. In this dissertation, I present three set of statistics for whole-genome scans of footprints of ancient natural selection, each addressing an unique challenge in the respective subject. I thoroughly benchmarked their performance with forward simulations, validated known selected loci in the empirical data, and made novel discoveries that bring insights to the adaptations in humans and great apes. I also implemented these methods in open-sourced softwares for future applications by the scientific community. For ancestral positive selection, I developed ancestral branch statistic (ABS), a four-population summary statistic that specifically detect ancient positive selection ancestral to the divergence of a pair of target populations. I show that ABS exhibit comparable performance to that of the model-based 3P-CLR in uncovering loci under positive selection, and has higher robustness against incomplete or soft sweeps and substantially higher specificity for ancestral selective events. For ancient balancing selection, I extended the existing frameworks for its detection to consider populations of multiple species and show that they have high power to specifically detect trans-species balancing selection. This set of approaches---model-based Ttrans and summary statistics HKAtrans and NCDtrans---circumvent the inconvenience in using trans-species polymorphism data and provide a diverse set of tools for future endeavors in studying balanced polymorphisms. I also created a set of composite likelihood-ratio statistics based on mixture-models--- B0, B0maf, B1, B2, B2maf---to accommodate the variability in footprint sizes of balancing selection, each tailored for a different type of input data. I demonstrate that $B$ statistics display comparable power to optimal performances of existing methods despite adopting the least optimal window size, and that they are robust to mutation rate elevations and other confounding factors. I further extend the $B$ statistics to consider multi-allelic balancing selection and validated their performance, marking the first set of model-based selection scan tailored for balanced polymorphisms with more than two alleles. In sum, I believe these sets of tools have considerably advanced the population genetic arsenal for selection scans, and will be of great use for future studies on ancient adaptive events.