QUANTITATIVE FUNCTIONAL MEASUREMENT OF A PROTEIN USING PHYLOGENETIC PROFILES

Open Access
- Author:
- Ko, Kyung Dae
- Graduate Program:
- Integrative Biosciences
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- February 26, 2009
- Committee Members:
- Randen L Patterson, Dissertation Advisor/Co-Advisor
Randen L Patterson, Committee Chair/Co-Chair
Reka Z Albert, Committee Member
Anton Nekrutenko, Committee Member
Michael N Teng, Committee Member
Damian Van Rossum, Committee Member - Keywords:
- the function of protein
proteomics
bioinformatics
RNA binding protein - Abstract:
- In principle, the amino acid sequence of a protein contains structural, functional, and evolutionary characteristics. Investigating these characteristics using computational methods provides a powerful resource. However, these methods have limitations in their ability to annotate the characteristics of proteins accurately. In an attempt to overcome this drawback, I have developed a unified computational pipeline, called the Gestalt Domain Detection Algorithm Basic Local Alignment Tool (GDDA-BLAST), for measuring the structural, functional and evolutionary characteristics of a protein using phylogenetic profiles. The performance of GDDA-BLAST is better than those of other method such as SAM and psi-BLAST in homology detection. Using GDDA-BLAST, I also implemented a classification library to find quantitative thresholds capable of inferring protein function using phylogenetic profiles. Using this library, I identified RNA-binding Proteins (RBPs) containing structural unique motifs by 2695 expanded Position Specific Scoring Metric (PSSM) profiles in a testing dataset with 37 positive and 118 negative sequences. We achieved 100% specificity, 96.8% accuracy, and 86.5% sensitivity. For the specific nucleotide binding folds (dsRNA vs. dsDNA, dsRNA vs. dsDNA, and ssRNA vs. ssDNA), our results exceeded those of obtained using Support Vector Machine (SVM) learning algorithms. Using this method, I also identified 29 and 168 novel RBPs in yeast and human proteomes. These results suggest that this method can be used to create PSSM databases for the quantitative measurement and classification of any protein function.