understanding gene expression and genetic recombination by next generation sequencing

Open Access
- Author:
- Han, Xinwei
- Graduate Program:
- Genetics
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- February 06, 2012
- Committee Members:
- Hong Ma, Dissertation Advisor/Co-Advisor
Stephen Wade Schaeffer, Committee Chair/Co-Chair
Gong Chen, Committee Member
Naomi S Altman, Special Member
Anton Nekrutenko, Committee Member - Keywords:
- next generation sequencing
meiotic recombination
small RNA
cortex gene expression - Abstract:
- The introduction of next-generation sequencing technologies has been changing the landscape of biological research. The plummeting cost of massive sequencing not only leads to the flourishing of various genome projects, but also opens up many opportunities in previously uncharted areas, two remarkable examples of which are sequencing different individuals of the same species and sequencing transcriptomes. With the availability of multiple genome sequences from a population, it is possible to systematically catalog natural variations and, more importantly, investigate their genome-wide distribution to deduce functional elements through conservation or selection. Moreover, as genomic changes reflect evolution at work, the comprehensive map of natural variations serves as a basis for studying the properties and effects of evolutionary forces. For transcriptomics, sequencing has become a revolutionary way to characterize gene expression. It not only offers high replicability and unparalleled accuracy, but also requires less input material, enabling transcriptome study in tiny structures, and no prior knowledge of gene structure, allowing detection of unknown transcripts. Here, three sequencing-based studies are presented. One study explored natural variations between two Arabidopsis ecotypes, Col and Ler, and then investigated one crucial evolutionary force to shape variations, meiotic recombination. The other two studies investigated the small RNA transcriptome in Arabidopsis meiocytes and mRNA transcriptomes in developing mouse cortex. In the first study, the sequencing and comparison between Col and Ler uncovered 249,171 SNPs, 58,085 small and 2,315 large indels, with highly correlated genome-wide distributions of SNPs and small indels. Disease resistance genes contain significantly more variations, suggesting adaptation to specific environmental niches. These variations were then used as markers to investigate meiotic recombinations, crossovers and gene conversions, in two tetrads, detecting 18 crossovers, 6 of which had an associated gene conversion event, and 4 independent gene conversions. The number and length of identified recombination events suggest that Arabidopsis gene conversions are likely fewer and with shorter tracts than those in yeast. In addition, the analysis of variations in offspring plants showed meiosis provided a rapid mechanism to generate copy number variations (CNVs) by reshuffling existing variations. In the second study, a recently developed method was applied to collect Arabidopsis meiocytes, a limited number of cells undergoing meiosis, and then small RNAs were profiled using SOLiD sequencing. 97 of 266 known miRNAs show expression in meiocytes. Interestingly, five miRNAs were found to account for more than half of the total miRNA expression in meiocytes, among which miR158a takes up about one third. The target genes of these five miRNAs have little or low expression in meiocytes. One putative novel miRNA was identified, which shows conservation with rice and maize. Analysis of longer reads provided clues for possible long ncRNAs in meiocytes. The mouse transcritpome study uncovered 3,758 differentially expressed genes between two critical stages of cortex development, embryonic day 18 (E18) and postnatal day (P7). Neurogenesis-related genes, such as Sox4 and Sox11, were more highly expressed at E18 than at P7. In contrast, the genes encoding synaptic proteins were up-regulated from E18 to P7, suggesting cortex development changes focus from neuron generation to synapse formation. In addition, approximately 500 genes with unknown function show dramatic change in expression level, serving as a blueprint for further experimental studies. Thousands of novel splice variants from 2,930 genes were identified, providing clues about another layer of dynamic expression. These sequencing-based studies pushed the limit of previous work. The characterization of meiotic recombination reached single base-pair resolution. The mouse transcriptome work is one of several early studies utilizing RNA-seq. The small RNA transcriptome in meiocytes, single type cells at microscopic scale, was profiled for the first time. All these studies provided unprecedented information about complicated biological processes.