Sepsis Data Analytics
Open Access
- Author:
- Shen, Sida
- Graduate Program:
- Industrial Engineering
- Degree:
- Master of Science
- Document Type:
- Master Thesis
- Date of Defense:
- April 06, 2020
- Committee Members:
- Soundar Kumara, Thesis Advisor/Co-Advisor
Robert Carl Voigt, Program Head/Chair
Kamesh Madduri, Committee Member - Keywords:
- Machine learning
Deep learning
Sepsis
Healthcare
Unsupervised Learning
Data Mining - Abstract:
- Sepsis is a potentially life-threatening condition caused by the body’s response to infection. Body releases chemicals into the blood stream to fight infection. However, sepsis occurs when the body’s response to the chemicals go out of balance. Despite the use of antibiotics and modern treatments, sepsis is still one of the main causes of ICU mortality rate. The current broad definition of sepsis is not suitable for the heterogeneous nature of this disease; it is necessary to discover novel phenotypes of sepsis and design custom treatment plans. In this thesis, two novel phenotype discovery methods have been successfully developed and tested on MIMIC-III database. The first method utilizes first lab result for each patient, after feature imputation to resolve missing values, 11 features are included (heart rate, respiratory rate, systolic blood pressure noninvasive (sbp-noninvasive), temperature, sodium, white blood cells (WBC), creatinine, glucose and all 3 scores on the Glasgow Coma Scale (GCS)). With dimensionality reduction using Principal Component Analysis and clustering using K-means algorithm, three phenotypes are discovered; the first group patients (population: 44.9%, mortality: 14.88%) have high possibility of respiratory and renal failures; the second group patients (population: 23.8%, mortality: 9.15%) have high possibility of liver and coagulation failures; the third group patients (population: 31.4%, mortality: 20.9%) have high possibility of cardiovascular and central nervous system (CNS) failures. In the second model, we adapted deep embedding clustering to cluster sepsis patient into novel phenotypes. We included 7 measurements (heart rate, respiratory rate, hemoglobin, white blood cell, creatinine, glucose and sodium) combined with 12 time-steps with 4-hour intervals (48 hours span). For each patient a sample with 84 features is constructed. A multi-layer fully connected auto-encoder is trained with 20 latent units; after 300 epochs, auto-encoder reconstruction loss (mean square error) converges. The encoder and soft assignment clustering layer are trained jointly using stochastic gradient descent; the auxiliary target distribution is updated every 200 steps. The network converges after 7800 steps with Kullback-Leibler divergence loss of 0.028. The derived 4 phenotypes present a clearly separated patient outcome with mortality standard deviation of 4.97%. However, by comparing bio-markers’ statistics with patient outcome across the derived phenotypes, we can not see reasonable pattern that connect the two. There are some high dimensional features that the deep clustering model has captured, which we believe can lead to the discovery of the true cause of sepsis mortality.