COMPUTATIONAL TOOLS AND THEIR APPLICATIONS IN PLANT COMPARATIVE GENOMICS

Open Access
Author:
Wall, P. Kerr
Graduate Program:
Biology
Degree:
Doctor of Philosophy
Document Type:
Dissertation
Date of Defense:
June 30, 2008
Committee Members:
  • Claude Walker Depamphilis, Committee Chair
  • Webb Colby Miller, Committee Chair
  • Hong Ma, Committee Member
  • Naomi S Altman, Committee Member
  • John Edward Carlson, Committee Member
Keywords:
  • Transcriptomics
  • Phylogenomics
  • Evolutionary Biology
  • Gene Family Evolution
  • Comparative Genomics
  • Computational Biology
  • Bioinformatics
  • Next Generation Sequencing
Abstract:
The integration and advancements of molecular biology, evolution, and computer science over the past few decades have led to the development of several new fields of study. Comparative genomics, the study of the similarities and differences between two or more genomes, continues to be fueled by the rapidly growing number of fully sequenced genomes in our public databases. As of 2008, the plant scientific community has sequenced ten plant genomes, with plans to sequence more than twenty genomes over the next few years. Therefore, there is a need for flexible, gene family focused databases that provide rich toolsets for comparative analyses of plants. The PlantTribes database is based on the results of a series of controlled protein clustering experiments performed at multiple stringencies, which produce sets of objectively defined plant gene families. Nearly a dozen published articles to date have relied on data extracted from PlantTribes including the recent Populus and papaya genome sequence papers, expression divergence following gene duplication, identification of gene families for intensive phylogenetic analysis, identification of microRNAs and their associated targets, and genome duplication history of basal angiosperms. Comparative genomics has also been aided in the rapid advancements in sequencing technologies over the last few decades. Next Generation (NG) sequencing technologies have become a great resource to the genomics community because of the extremely low ‘per base’ cost of sequencing. A simulation approach was developed to help determine the optimal mixture of sequencing methods for most complete and cost effective transcriptome sequencing. In terms of sequence coverage alone, the NG sequencing platforms are a dramatic advance over capillary-based sequencing. Sequencing and microarray outcomes from multiple experiments suggest that our simulator will be useful for guiding Next Generation transcriptome sequencing projects in a wide range of organisms.