Predictive Analytics of Patient Hospital Adverse Events after Colorectal Surgery

Open Access
Guo, Sijia
Graduate Program:
Industrial Engineering
Master of Science
Document Type:
Master Thesis
Date of Defense:
Committee Members:
  • Soundar Kumara, Thesis Advisor
  • Guodong Pang, Thesis Advisor
  • Venkataraman Shankar, Thesis Advisor
  • Regression
  • Readmission
  • Mortality
  • Length of Stay
  • Complication
Patients can face adverse events after colorectal surgery in a hospital. These adverse events include 30-day readmission, 30-day mortality, prolonged length of stay and complications after surgery. Predicting these events based on clinical and demographic data is critical for pre and post-operative surgical intervention. In this thesis, we investigate predictive analytics for adverse events for patients undergoing colorectal surgery. For the design, development and implementation of predictive analytics we use real life data provided by a prominent hospital system located in the Northeastern region of Pennsylvania. To protect the proprietary aspects of the data, all the variables in the dataset are de-identified. The longitudinal dataset consists of 8150 original records and 322 attributes from August 2006 to October 2014. In addition, the data related to the provider’s behaviors with respect to colorectal surgery is also used. The first step in this thesis addresses data cleaning and filling in the missing values. We use several statistical methods to perform these tasks. As the attribute set is large using subjective as well as statistical means we reduce the dimensionality to 120 attributes. We investigate four methodologies, including Naïve Bayes, Random Forest, Gradient Boosting Method and Logistic Regression for predictive analytics. We construct regression models with the four methodologies, the models are either tree-based or classifying models. After the models are constructed, we conduct comparative analysis of the methodologies based on certain performance criteria, and select the most accurate model for further study. We find that Gradient Boosting Method (GBM) has the best performance. We examine the most important predictors in the selected model, look for predictor features, and investigate intrinsic implications behind the predictors. Our conclusions point to the fact that patient’s health condition and surgery information are the most important factors leading to adverse events. We suggest areas of future research. Most of the past work in colorectal surgical adverse event prediction deals with retrospective data collected from several hospitals across several years. In addition, a fairly large number of specific inputs from surgeons are also used. This puts considerable burden on the hospital staff to collect data. Our model is developed using the data from only one hospital, with the general inputs from the physicians and surgeons that are normally entered. This simplifies the data collection and usage. The results we have obtained are comparable to the national statistics from the earlier models. However, we need to study several hospitals to validate our model and evaluate its efficacy.