COMPUTATIONAL TOOLS AND THEIR APPLICATIONS IN PLANT COMPARATIVE GENOMICS

Open Access
- Author:
- Wall, P. Kerr
- Graduate Program:
- Biology
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- June 30, 2008
- Committee Members:
- Claude Walker Depamphilis, Committee Chair/Co-Chair
Webb Colby Miller, Committee Chair/Co-Chair
Hong Ma, Committee Member
Naomi S Altman, Committee Member
John Edward Carlson, Committee Member - Keywords:
- Transcriptomics
Phylogenomics
Evolutionary Biology
Gene Family Evolution
Comparative Genomics
Computational Biology
Bioinformatics
Next Generation Sequencing - Abstract:
- The integration and advancements of molecular biology, evolution, and computer science over the past few decades have led to the development of several new fields of study. Comparative genomics, the study of the similarities and differences between two or more genomes, continues to be fueled by the rapidly growing number of fully sequenced genomes in our public databases. As of 2008, the plant scientific community has sequenced ten plant genomes, with plans to sequence more than twenty genomes over the next few years. Therefore, there is a need for flexible, gene family focused databases that provide rich toolsets for comparative analyses of plants. The PlantTribes database is based on the results of a series of controlled protein clustering experiments performed at multiple stringencies, which produce sets of objectively defined plant gene families. Nearly a dozen published articles to date have relied on data extracted from PlantTribes including the recent Populus and papaya genome sequence papers, expression divergence following gene duplication, identification of gene families for intensive phylogenetic analysis, identification of microRNAs and their associated targets, and genome duplication history of basal angiosperms. Comparative genomics has also been aided in the rapid advancements in sequencing technologies over the last few decades. Next Generation (NG) sequencing technologies have become a great resource to the genomics community because of the extremely low ‘per base’ cost of sequencing. A simulation approach was developed to help determine the optimal mixture of sequencing methods for most complete and cost effective transcriptome sequencing. In terms of sequence coverage alone, the NG sequencing platforms are a dramatic advance over capillary-based sequencing. Sequencing and microarray outcomes from multiple experiments suggest that our simulator will be useful for guiding Next Generation transcriptome sequencing projects in a wide range of organisms.