DATA-DRIVEN MODELING AND INTERPRETABLE MACHINE LEARNING WITH APPLICATIONS IN HEALTHCARE

Open Access
- Author:
- Liu, Ning
- Graduate Program:
- Industrial Engineering
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- July 22, 2019
- Committee Members:
- Soundar Kumara, Dissertation Advisor/Co-Advisor
Soundar Kumara, Committee Chair/Co-Chair
Guodong Pang, Committee Member
Eunhye Song, Committee Member
Terry Harrison, Outside Member
Ling Rothrock, Program Head/Chair - Keywords:
- Data-Driven Modeling
Interpretable Machine Learning
Healthcare Analytics
Drug-Drug Interactions
Patient Satisfaction - Abstract:
- The promise of machine learning in transforming all aspects of healthcare ecosystems has received global attention. Machine learning employs sophisticated algorithms to transform massive amounts of data into actionable insights, and ambitiously leads the way in reshaping the healthcare industry. Owing to the unique characteristics of healthcare data and the highly-regulated nature of the healthcare industry, challenges largely remain in successfully applying machine learning to healthcare. Data generated in healthcare usually comes from various sources across multiple service units and agencies. Besides the issues of inconsistency and redundancy, healthcare data are generally noisy, sparse, unstructured, and heterogeneous. The data quality issues pose severe threats to the accuracy and authenticity of machine learning results. Furthermore, healthcare decisions and policies derived from machine learning models must be interpretable and can be intuitively understood by health professionals. However, most of the best-performing machine learning models tend to function like a black box and fail to provide any explanations on how the decisions are reached; the lack of transparency creates barriers for humans to understand and trust model results. As with any other high-stakes decision situations, understanding the reasons why the model works is as important as what the prediction result is. The surge of interests in model interpretability has led to the development of interpretable machine learning techniques. In response to the data quality and model interpretability challenges, this dissertation explores three essential and interrelated healthcare analytics problems with viewpoints from data-driven modeling and interpretable machine learning. In the first problem, we investigate utilizing a set of health-related databases to identify high-priority drug-drug iterations (DDIs) for use in medication alerts. We propose a data-driven framework to extract useful features from the FDA adverse event reports and develop an autoencoder-based semi-supervised learning algorithm to make inferences about potential high-priority DDIs. The experimental results demonstrate the effectiveness of using adverse event feature representations in differentiating high- and low-priority DDIs. Moreover, the proposed algorithm utilizes stacked autoencoders and unlabeled samples for boosting classification performance, which outperforms other competing semi-supervised methods. The second and third problems are related to patient satisfaction studies. We focus on decoding the mysteries behind patient satisfaction using the insights extracted from hospital electronic health records and patient survey data. In the second problem, we propose an interpretable machine learning framework that transforms heterogeneous data into human-understandable feature representations and then utilizes a mixed-integer programming model to discover the major factors that influence patient satisfaction. In the third problem, we introduce a post hoc local explanation method to interpret black-box model outputs aiming at closing the gap between model decisions and the understanding of healthcare users. Results of the real-world case studies show that factors related to the courtesy and respect from nurses and doctors, communication between health professionals and patients, and hospital discharge instructions significantly impact the overall patient satisfaction. Our approach and findings help establish guidelines for quality healthcare in the future.