Advancing Predictive Models in Healthcare: From Single to Multi-Modality
Restricted (Penn State Only)
- Author:
- Luo, Junyu
- Graduate Program:
- Informatics
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- February 21, 2024
- Committee Members:
- Dongwon Lee, Professor in Charge/Director of Graduate Studies
Sharon Huang, Major Field Member
Fenglong Ma, Chair & Dissertation Advisor
Rui Zhang, Outside Unit & Field Member
Lu Lin, Major Field Member - Keywords:
- Healthcare Data
Predictive Model
Deep Learning - Abstract:
- The advent of predictive machine learning models has revolutionized healthcare, making it more reliable and cost-effective. These models find extensive applications in disease diagnosis, treatment planning, patient monitoring, and public health management. The development of advanced predictive models, however, confronts unique challenges in the healthcare domain, primarily due to the multi-modal nature of healthcare data, encompassing text, images, codes, and laboratory results. Such diversity in data types presents significant hurdles in model development and integration, necessitating innovative approaches for effective data processing and analysis. This thesis delves into the evolution of advanced predictive models in healthcare, transitioning from single-modal to multi-modal data frameworks while addressing the inherent complexities associated with each. It particularly focuses on the unique challenges posed by different data modalities, such as text and drug data, and underscores the importance of integrating these diverse sources into unified, effective models. The first part of this thesis aims to present novel methodologies for developing single-modality predictive models in healthcare. This includes HiTANet, a hierarchical time-aware attention network for disease risk prediction using a single modality -- International Classification of Diseases (ICD) codes. HiTANet represents a significant leap in modeling time-sensitive disease progression. Additionally, this thesis explores text modality for clinical trial outcome prediction and ICD coding. This encompasses the introduction of an automated model and dataset for predicting clinical trial outcomes and two novel approaches for ICD code prediction from clinical notes. Including Fusion, which addresses the redundant noisy clinical note data, and CoRelation for code prediction using the graph modeling of external knowledge to boost the ICD coding precision. Shifting the focus to multi-modal predictive models, the second part of this thesis introduces an innovative method a novel personalized model, pADR, is proposed for predicting adverse drug reactions. This model integrates diverse data sources, tackling the challenge of balancing different modalities to enhance prediction accuracy. The effectiveness of these models is substantiated through comprehensive testing on multiple datasets, demonstrating their superiority over existing methodologies. This dissertation makes substantial contributions to healthcare data analytics by developing cutting-edge predictive models tailored for the unique aspects of single and multi-modal healthcare data. Its methods, applicable in industrial settings, extend to other domains requiring advanced data analysis.