Open Access
Zhang, Ya
Graduate Program:
Information Systems
Doctor of Philosophy
Document Type:
Date of Defense:
April 29, 2005
Committee Members:
  • Chao Hsien Chu, Committee Chair
  • Hongyuan Zha, Committee Chair
  • C Lee Giles, Committee Member
  • James Z Wang, Committee Member
  • Liwang Cui, Committee Member
  • microarray analysis
  • protein domain
  • protein interaction prediction
  • bioinformatics
  • machine learning
With the accomplishment of the Human Genome Project, the study of proteins and their functions has become a major focus of current biological research. Of particular interest are their interactions, which are very important in determining cellular functions because proteins seldom act alone. High throughput experiments have produced a large volume of information about pair-wise protein-protein interactions. However, the data contain a large amount of false negatives (i.e., incomplete interaction data) and false positives (i.e., fake interactions). Our effort in analyzing the pairwise interaction data is to mine the coherent information and forecast unobserved interactions from experimental interaction data. <p>As proteins are assumed to interact through their domains, which are considered to be the building blocks of proteins, a domain-based approach for inferring interactions is adopted. We propose a new framework of learning by modeling the problem of interaction inference as a constraint satisfiability problem and solve it as a linear program. To handle the cases where multiple domains contribute to one interaction, a hyperclique pattern based method is used to select domain combinations, which are then deemed as a single unit of the interaction. <p>The domain-based approaches require a reasonable assignment of domains. However, the vagueness of domain definition adds another layer of difficulty in the inference. We thus investigate the consensus of domain definitions through the comparative mapping of two types of domain definitions. In the cases of disagreement, the functional and evolutionary characteristics of the domains are examined to determine which domain definition is biologically more informative. <p>One limitation shared by all domain-based interaction inference methods is that domain composition is considered as the sole determining factor for interactions. However, the presence of a pair of interacting domains in a pair of proteins only sets the potential for the two proteins to interact. However, in a real biological setting, this does not necessarily mean that the two proteins will interact. We attempt to use protein expression profiles to filter out spurious interactions. Because each protein may participate in a number of biological processes and thus will interact with different proteins at different cellular stages, locally co-expressed protein clusters are discovered by biclustering the time-series gene expression data.