INFERENCE OF ORTHOLOGS, WHILE CONSIDERING GENE CONVERSION, TO EVALUATE WHOLE-GENOME MULTIPLE SEQUENCE ALIGNMENTS

Open Access
- Author:
- Hsu, Chih-Hao
- Graduate Program:
- Computer Science and Engineering
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- October 02, 2009
- Committee Members:
- Webb Colby Miller, Dissertation Advisor/Co-Advisor
Webb Colby Miller, Committee Chair/Co-Chair
Raj Acharya, Committee Member
Wang Chien Lee, Committee Member
Ross Cameron Hardison, Committee Member - Keywords:
- Evolution
Gene conversion
Orthology
Multiple-Sequence Alignment - Abstract:
- The problem of computing a multiple-sequence alignment (MSA) is very important for the analysis of biological sequences. An equally critical problem is to evaluate the quality of an alignment. In the preliminary project described here, alignments produced by Multiz and ROAST of the human genome to other vertebrate genomes are evaluated using orthologous genes in 13 gene clusters from 6 mammalian species, which are identified using maximum-likelihood phylogenetic tree reconstruction methods. Analysis of the α- and β-globin gene clusters show that inferred ortholog relationships are accurate. The orthologous β-globin genes from over 14 species are used to evaluate the performance of four MSA programs (MLAGAN, MAVID, TBA and ROAST). The results show that the performance of ROAST is superior to the others. Furthermore, differences among gene clusters and among species are studied. This approach not only indicates the quality of a given alignment, but also helps us understand the alignment’s drawbacks and gives us some clues about how to build the next generation of multiple alignment programs. To obtain accurate orthologs, the impact of gene conversion is studied in this thesis. Gene conversion events are often overlooked in analyses of genome evolution. In such an event, an interval of DNA sequence (not necessarily containing a gene) overwrites a highly similar sequence. The event creates relationships among genomic intervals that can confound prediction of orthologs and attempts to transfer functional information between genomes. Here we propose different gene conversion detection methods for different scale of data. Detailed information about conversion events between gene pairs is determined, including their directionality. Furthermore, we analyze 1,112,202 highly conserved pairs of human genomic intervals, and detect a conversion event for about 13.5% of them. Properties of the putative gene conversions are analyzed, such as the distributions of the lengths of the converted regions and the spacing between source and target. Finally, we also apply our method for several well-studied gene clusters, including the globin genes.