Formal Methods for Genomic Data Integration
Open Access
- Author:
- Shah, Nigam
- Graduate Program:
- Integrative Biosciences
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- June 07, 2005
- Committee Members:
- Nina Fedoroff, Committee Chair/Co-Chair
Mark Shriver, Committee Member
Wojciech Makalowski, Committee Member
Francesca Chiaromonte, Committee Member
Gustavo A Stolovitzky, Committee Member - Keywords:
- Microarray
ontology
modeling biological processes
knowledgebases - Abstract:
- The rapid growth of life sciences research and the associated literature over the past decade, the rapid expansion of biological databases, and invention of high throughput techniques that permit collection of data on many genes and proteins simultaneously have created an acute need for new computational tools to support the biologist in collecting, evaluating and integrating large amounts of information of many disparate kinds. This thesis presents methods for the representation, manipulation and conceptual integration of diverse biological data with prior biological knowledge to facilitate both, interpretation of data and evaluation of hypotheses. We have developed a tool (called CLENCH) that assists in the interpretation of gene-lists resulting from microarray data analysis, by integrating and visualizing Gene Ontology (GO) annotations and transcription factor binding site information with gene expression data. During the development of CLENCH, it became evident that developing a unified framework for representing prior knowledge and information can increase our ability to integrate new data with existing knowledge. In subsequent work, we developed the HyBrow (Hypothesis Browser) system as a prototype tool for designing hypotheses and evaluating them for consistency with existing knowledge. HyBrow consists of a conceptual framework with the ability to represent diverse biological information types, an ontology for describing biological processes at different levels of detail, a database to query information in the ontology, and programs to design, evaluate and revise hypotheses. We demonstrate the HyBrow prototype using the galactose gene network in Saccharomyces cerevisiae as a test system. Along with the increase in available information, knowledgebases, which provide structured descriptions of biological processes, are proliferating rapidly. In order to support computer-aided information integration tools like HyBrow, a knowledgebase should be trustworthy and it should structure information in a sufficiently expressive manner to represent biological systems at multiple scales. We extend and adapt the conceptual framework underlying HyBrow and use it to verify the trustworthiness and usefulness of the Reactome knowledgebase.