model selection and survival analysis with application to large time-varying networks

Open Access
Cai, Xizhen
Graduate Program:
Doctor of Philosophy
Document Type:
Date of Defense:
June 16, 2014
Committee Members:
  • David Russell Hunter, Dissertation Advisor
  • Runze Li, Dissertation Advisor
  • Debashis Ghosh, Committee Member
  • Marcel Salathe, Committee Member
  • penalized likelihood
  • proportional odds models
  • profile likelihood
  • partial likelihood
  • case control.
Survival models have been applied to time-to-event data for a long time, and usually a number of covariates are assumed to influence the distribution of the time to event through the model. The Cox proportional hazard model is commonly used in this context. To have a parsimonious model without losing consistency in estimation, several authors have extended the variable selection techniques of Fan and Li (2001) to survival settings. For example, the variable selection problem for the Cox model is studied in Fan and Li (2002). Recently, survival models like the Cox model are also extended to apply to dynamic network data (Vu et al., 2011b; Perry and Wolfe, 2013), where the observations are dependent. In this dissertation, we study the variable selection problem for a survival model other than the Cox model. In addition, we extend the variable selection work to the dynamic network model setting. We first discuss the problem of variable selection for the proportional odds model, an alternative to Cox’s model, and show how to maximize the penalized profile likelihood to estimate parameters and select variables simultaneously. Using a novel application of the semi-parametric theory developed by Murphy and Van der Vaart (2000), we derive asymptotic properties of the resulting estimators, including consistency results and the oracle property. In addition, we propose algorithms to maximize the penalized likelihood estimator based on a majorization-minimization (MM) algorithm. Tests on simulated and real data sets demonstrate that the newly proposed algorithm performs well in practice. Next, we extend the penalization idea to the Cox model in an egocentric approach to dynamic networks, and select covariates by maximizing the penalized partial likelihood function. Asymptotic properties of both the unpenalized and penalized partial likelihood estimates are developed under certain regularity conditions. We also implement the estimation and test the prediction performance of these estimates in a citation network. Since the covariates are time-varying, the computation cost is high. After variable selection, the model is reduced, which simplifies the calculation for future predictions. Another method to reduce the computational complexity is to use the case-control approximation, in which instead of using all the at-risk nodes in the network, only a subset is sampled to evaluate the partial likelihood function. By using this approximation, the computation time is shortened dramatically, while the prediction performance is still satisfactory in the citation network.