POPULATION GENOMIC STRUCTURE OF EUROPE: INFORMATIVE MARKERS, METHODS AND PHENOTYPES

Open Access
- Author:
- Bauchet, Marc Philippe
- Graduate Program:
- Anthropology
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- May 14, 2007
- Committee Members:
- Mark Shriver, Committee Chair/Co-Chair
Kenneth Monrad Weiss, Committee Member
Alan Walker, Committee Member
Hiroshi Akashi, Committee Member - Keywords:
- Europe
population genetics
human pigmentation
genetic ancestry - Abstract:
- Among continents Europe is remarkable for its relative small size, dearth of migration barriers and abundance of historical population movements. These features are compatible with the observation of small inter-population genetic distances and relative lack of population structure compared to other continents. However some phenotypic features such as lightness in hair, skin and eye pigmentation are specific to Europe and diverse throughout the continent. I present a survey of variation of these phenotypes among population samples collected in 2005 and 2006 in the US and Europe (by me and collaborators, with proper consent from participants), including some 1,000 European Americans and 720 Europeans, and focusing on samples of French, US Ashkenazi Jews, Irish, Italian, Polish and Portuguese descent (respectively 169, 39, 147, 140, 84 and 187 individuals). In order to study genetic variation in Europe a variety of population samples without phenotypes was also assembled. These samples from Europe and neighboring regions were obtained from various sources, including the CEPH-HGDP, the Coriell Institute, and many collaborators. The genetic ancestry and population structure of Europe were first investigated through a commercial genetic test, EuroDNA 1.0 by the company DNAPrint (Sarasota, FL). In parallel this allowed evaluating the usefulness and performance of this test to predict individual ancestry and possibly pigmentation phenotypes; both possibilities appeared rather limited, mainly because of the low informativeness and size of the marker panel used. I then report on genome-wide typing of 297 individuals for ca. 10,000 (10k) single nucleotide polymorphisms (SNPs)—using Affymetrix (Santa Clara, CA) 10k mapping array; the data reveal significant axes of population structure in Europeans of known and unknown ancestry, mainly differentiating Iberians, northern Europeans and southeastern Europeans, as well as Basque and Finnish individuals. A proper understanding of population genetic stratification—differences in individual ancestry within or among populations—is crucial in attempts to find genes for complex traits through association mapping. This section using the 10k array demonstrates the selection and application of EuroAIMs (European Ancestry Informative Markers) for ancestry estimation and correction of stratification, using validated Bayesian analysis (structure program) and Principal Coordinate Analysis (PCoA) from individual allele sharing distances. Based on the latter I present methods for detecting and measuring genetic population structure in individual population samples—one based on ANOVA, the other being an expansion of a split-half reliability method. The Coriell “Caucasian” and CEPH Utah sample panels, often used as proxies for European populations, are found to reflect different subsets of the continent’s ancestry. Finally, using Illumina 317k mapping array data, both from 29 population sample pool and 180 individual genotypes (mainly from Ireland, Italy, Poland and Portugal), I confirm and expand the description of European population structure. A preliminary panel of 12k EuroAIMs was selected from the population pools, and applied to the 180 European individuals, which cluster apart based on country of origin (Ireland, Italy, Poland and Portugal). Individuals from other countries are also presented in this context. The availability of pigmentation phenotypes for these individuals will allow gene association studies and possibly admixture mapping. I finally discuss the potential of such data for biomedical research and understanding human genetic variation, as well as the danger in overstating commercial genetic test results to the public.