Open Access
Wang, Xingsheng
Graduate Program:
Computer Science
Master of Science
Document Type:
Master Thesis
Date of Defense:
October 13, 2017
Committee Members:
  • Jeremy Blum, Thesis Advisor
  • Thang N. Bui, Committee Member
  • Linda Null, Committee Member
  • Sukmoon Chang, Committee Member
  • Omar El Ariss, Committee Member
  • Hyuntae Na, Committee Member
  • safety performance functions
  • roadway segmentation
  • machine learning
  • Negative Binominal models
  • coordinate-descent approach
  • Weighted Absolute Percentage Error
  • clustering
  • generalizability of models
Building predictive models called safety performance functions (SPFs) is important for the study of roadway safety. The first step in SPF modeling is roadway segmentation, which partitions roadways into segments. To build the predictive models, we train the models with a certain amount of observations. The observations cover as many cases as possible in order to build better and transferable model. These observations with different geometrical parameters and number of crashes are derived from the segmentation. Roadway segmentation is not only an essential but a challenging step. Previous studies have found that segmentation approaches affect the models’ transferability, for example, their predictive ability for future crashes or crashes on other roadways. Some researchers find that a little shift in segmentation yields very different models. To find better approaches to segmentation, in this thesis, we propose a novel segmentation methodology, which is driven by a machine learning clustering approach. While this approach limits in its ability to improve model transferability, it does help to characterize the extent to which segmentation approaches affect conclusions drawn from the models. In the clustering step of this approach, roadway segmentation is based on a weighted distance between adjacent segments. Segmented roadway data is used to build models that allow for the estimation of the gradient in the error metric as a function of the segmentation weights. The weights are updated based on this gradient, and this process repeats with the performance of models guiding the updating of weights and the resulting segmentation.