Leveraging Machine Learning for Analyzing Individual and Aggregate-Level Healthcare Data

Restricted (Penn State Only)
- Author:
- Liu, Meng
- Graduate Program:
- Industrial Engineering
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- June 24, 2023
- Committee Members:
- Steven Landry, Program Head/Chair
Soundar Kumara, Chair, Minor Member & Dissertation Advisor
Paul Griffin, Major Field Member
Kamesh Madduri, Outside Unit & Field Member
Swaminathan P. Iyer, Special Member
Qiushi Chen, Major Field Member
Kartik Ramakrishna, Special Member - Keywords:
- Machine Learning
Healthcare
data analtics - Abstract:
- The widespread availability of electronic health records (EHRs) presents a unique opportunity to utilize machine learning for analyzing healthcare data. EHRs contain a wealth of information, encompassing individual and aggregate-level healthcare data, which can be harnessed to derive valuable insights for patient care and public health management. Machine learning techniques are particularly well-suited for this task due to their ability to model complex relationships, learn patterns from large-scale data, and make accurate predictions. By employing advanced algorithms and data-driven approaches, machine learning can help uncover hidden trends and generate actionable insights from diverse healthcare datasets. This dissertation aims to explore the application of machine learning techniques to analyze these various data types, focusing on the transition from EHRs to structured individual and aggregate-level healthcare data. To facilitate this transition, the research addresses the challenges associated with data preprocessing, integration, and analysis, developing innovative methods for converting raw EHR data into structured formats suitable for machine learning algorithms. This dissertation addresses 1) potential drug-drug interaction detection and post-market surveillance with pharmacovigilance data, 2) sleep health analysis with actigraphy data, and 3) COVID-19 analytics with aggregate-level epidemiological data. In this dissertation three kinds of analysis are considered: 1) The first type is the individual-level data obtained from pharmacological studies on drug-drug interactions; 2) The second type considers both individual and aggregate-level data with temporal aspects incorporated. 3) The third data structure we consider relates to aggregate-level population data. In Chapter 2, the focus is on analyzing individual-level pharmacovigilance data, specifically adverse event analysis, to detect potential drug-drug interactions and investigate the safety of COVID-19 vaccines. This case study demonstrates the utility of machine learning in identifying and mitigating risks associated with drug combinations and vaccine post-market surveillance. In Chapter 3, the analysis shifts to individual-level longitudinal data, such as actigraphy data, to improve the prediction of sleep-wake states and provide a reliable estimation of sleep parameters. This case study showcases the potential of machine learning algorithms in enhancing the understanding of sleep patterns and promoting better sleep health practices. In Chapter 4, the research investigates aggregate-level healthcare data, focusing on COVID-19 epidemiological data. The case study emphasizes the application of machine learning techniques to address and solve problems related to the COVID-19 pandemic. One specific problem examined is the deviations in predicted COVID-19 cases in the US during the early months of 2021, which can be attributed to the emergence and spread of the B.1.526 variant and its associated subvariants. Through this analysis, the the study demonstrates the power of machine learning in uncovering the impact of emerging variants on the pandemic’s trajectory and informing public health decision-making. The three different kinds of contexts considered in the dissertation lead to some insights that are related: 1. Individual parameters and external parameters (drug composition), even though this could lead to complexity due to multilevel interactions by decomposing the problem (anticoagulant and their interaction). It is possible to build complex decision analysis mechanisms with explainability at both the local and global levels. 2. Analyzing longitudinal and dynamic data, such as those derived from actigraphy devices, may seem straightforward but can present intriguing challenges. Specifically, within the context of sleep-wake cycles, it can be complex to distinguish between sleep and wakefulness based on individual data patterns. This is also exacerbated due to the imbalanced data. 3. Community-level data, particularly the impact of Covid-19 on various population groups present a unique challenge in understanding the effects of Covid-19 variants on case and death rates across different geographical locations and time periods. In this context, it is crucial to discern the role of key variables. This dissertation employs relative importance analysis to provide critical insights into the impact of the Covid-19 variant B.1.1.7 across various states over time.