Advancing Biomedical Exploration: Leveraging Biomedical Knowledge Graphs with Computational Methods

Open Access
- Author:
- Ma, Chunyu
- Graduate Program:
- Bioinformatics and Genomics (PhD)
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- June 10, 2024
- Committee Members:
- David Koslicki, Program Head/Chair
Suhang Wang, Outside Unit Member
David Koslicki, Chair & Dissertation Advisor
Mehrdad Mahdavi, Outside Field Member
Reka Albert, Major Field Member - Keywords:
- Knowledge Graph
Computational Method
Biomedical Exploration - Abstract:
- Biomedical systems are highly complex, involving multiple disciplines such as biology, chemistry, clinical medicine, and environmental science. Understanding the inter- or intra- relationships of knowledge in these disciplines is beneficial for addressing a wide range of health-related issues (e.g., drug repurposing, pathogen detection). However, a single type of data or technique within a specific discipline often falls short in uncovering hidden connections across disciplines and fails to capture different views and holistically comprehend the relevant biomedical mechanisms. Although still in the early stage of practical applications, researchers have demonstrated the success of leveraging knowledge graph techniques to integrate unstructured knowledge into a structured semantic graph, allowing to explore more unknown relationships and properties based on the existing biomedical knowledge. In this dissertation, we aim to further advance the applications of knowledge graphs to explore biomedical issues by combining them with the novel machine learning (ML) method, a querying and reasoning system, as well as other computational algorithms. In Chapter 1, we first introduce the fundamental concepts about biomedical knowledge graphs (BKGs), existing BKG applications, a standardized data integration framework (i.e., Biolink model) and a large-scale standardized BKG (i.e., RTX-KG2) as the foundation of data standard and resource for most subsequent chapters. The chapter also discusses the existing problems and challenges of building and applying BKGs and summarizes the existing computational methods for BKG analysis. Some basic knowledge of metagenomics is also included to better understand Chapters 4 and 5. In Chapter 2, we present KGML-xDTD, a novel ML-based framework for enhancing the accuracy and biological interpretability of drug predictions by incorporating biomedical knowledge graphs. KGML-xDTD combines and utilizes the advantages of several machine learning models to capture different information (e.g., node attributes, graph structure) from biomedical knowledge graphs and innovatively uses the biological demonstration paths to guide the agent of reinforcement learning in finding biologically reasonable BKG paths as mechanism explanations. We also demonstrate its effectiveness via two case studies. In Chapter 3, we leverage biomedical knowledge graphs to answer biomedical questions and hypotheses and develop a querying and reasoning system called ARAX to achieve this goal. This system can efficiently and dynamically integrate more than 100 biomedical knowledge sources from around 40 knowledge providers (KPs) to explore biomedical systems and questions. It combines several computational algorithms including Fisher's exact test, Jaccard similarity, Normalized Google Distance, as well as the drug prediction model introduced in Chapter 2, to rank and select relevant knowledge as an ``answer'' for a given query. Chapters 4 and 5 together narrow down the application of knowledge graph techniques from broad biomedical questions to exploring biomedical issues in the field of metagenomics, the study of the genomic content of microbial communities. The work in chapter 4 constructs a metagenomics-focus biomedical knowledge graph -- MetagenomicKG. It integrates microbe-disease relevant knowledge including drugs, microbes (e.g., fungi, bacteria, viruses), genetic materials (e.g., genes, proteins), pathways, diseases. The work in Chapter 5 develops a metagenomics-based statistical tool that acts as a ``glue'' for connecting MetagenomicKG with specific metagenomic samples, which further enhances the dynamic analysis of knowledge graphs. In Chapter 6, we conclude the contribution of all my research work in this dissertation and also discuss the possibility of unifying knowledge graph techniques and large language models (LLM) to further facilitate BKG construction and exploration.