Advancing Predictive Modeling on Electronic Health Records: From Handcrafted to Automated Methods

Open Access
- Author:
- Cui, Suhan
- Graduate Program:
- Informatics
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- February 07, 2025
- Committee Members:
- Sharon Huang, Major Field Member
Qiushi Chen, Outside Unit & Field Member
Dongwon Lee, Chair & Dissertation Advisor
Minhao Cheng, Major Field Member
Prasenjit Mitra, Special Member
Carleen Maitland, Program Head/Chair - Keywords:
- Electronic Health Records
Machine learning
Deep Learning
Health Informatics
Automated Machine Learning
Multi-Modal Learning
Multi-Task Learning
Knowledge Graph
Text Mining - Abstract:
- Electronic Health Record (EHR) systems are widely adopted across healthcare institutions, collecting vast amounts of patient data and serving as a foundation for healthcare research. These systems enable exploratory and predictive analytics, facilitating advancements in medical applications such as disease diagnosis, treatment recommendations, and patient monitoring. In recent years, researchers and clinicians have increasingly leveraged machine learning (ML) techniques to analyze EHR data. However, developing effective ML models for EHR data remains a significant challenge, primarily due to the reliance on human experts to handcraft these models. This process demands expertise in both ML and medical domains and involves labor-intensive efforts to design and optimize model architectures, which often results in models tailored to specific datasets or tasks, limiting their generalizability. Consequently, there is a pressing need for innovative approaches to streamline ML model development for EHR data, minimizing the reliance on domain expertise and manual effort while enhancing model performance across different scenarios. This dissertation focuses on advancing predictive modeling for EHR data, transitioning from handcrafted to automated methodologies. In the first part, we propose two handcrafted frameworks, namely MedPath and MedRetriever, which enhance predictive models using external medical knowledge sources, including knowledge graphs (KG) and medical texts. These frameworks improve model performance and interpretability while reducing reliance on domain-specific expertise. Notably, they can integrate seamlessly with any existing predictive models, offering improved performance across diverse datasets and tasks. In the second part, we introduce automated frameworks designed to address predictive modeling challenges in multi-modal and multi-task learning for EHR data. These include AutoMed, AutoFM, and AutoDP, which leverage data-driven approaches to automatically design model architectures. By eliminating the need for manual intervention, these methods enhance performance and generalizability across various datasets and tasks. What is more, these methods further reduce the labor efforts to design the model architectures compared to handcrafted methods. Overall, the methodologies presented in this dissertation—both handcrafted and automated—provide general solutions for EHR modeling, improving model applicability and performance across diverse scenarios.