Statistical frameworks for evaluating evolutionary hypotheses from genomic data

Open Access
- Author:
- Mughal, Mehreen
- Graduate Program:
- Bioinformatics and Genomics
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- July 24, 2020
- Committee Members:
- Michael DeGiorgio, Dissertation Advisor/Co-Advisor
Stephen Wade Schaeffer, Committee Chair/Co-Chair
George H Perry, Committee Member
Jesse R Lasky, Committee Member
Francesca Chiaromonte, Outside Member
George H Perry, Program Head/Chair
Michael DeGiorgio, Committee Chair/Co-Chair
Stephen Wade Schaeffer, Dissertation Advisor/Co-Advisor - Keywords:
- Population genetics
Evolution
Anthropological genetics
Regression
Statistical learning - Abstract:
- As genome sequence data availability increases, the demands for methods that are able to efficiently and rigorously extract relevant information from this data also escalate. Much of these analysis requirements have historically been fulfilled by summary statistics. Recent computational and statistical advancements, however have provided frameworks for the development of more sophisticated methods. Here, we explore the properties of summary statistics for classifying and analyzing selection as well as evaluate their performance under sub-optimal conditions. We learn how different types of selection affect the spatial distribution of genomic diversity by evaluating and measuring the diversity with a combination of summary statistics. This allows us to not only classify and differentiate among different types of positive selection, but also to learn about the parameters shaping these selective sweeps. We also explore the properties of the popular F- and D-statistics and find some of them are biased with the inclusion of related or inbred individuals. We derive a correction for these biases and apply these new statistics to make improved inferences about relationships among different human populations.