model selection and survival analysis with application to large time-varying networks

Cai, Xizhen

model selection and survival analysis with application to large time-varying networks

Open Access

Author:: Cai, Xizhen
Graduate Program:: Statistics
Degree:: Doctor of Philosophy
Document Type:: Dissertation
Date of Defense:: June 16, 2014
Committee Members:: David Russell Hunter, Dissertation Advisor/Co-Advisor
Runze Li, Dissertation Advisor/Co-Advisor
Debashis Ghosh, Committee Member
Marcel Salathe, Committee Member
Keywords:: penalized likelihood
proportional odds models
profile likelihood
partial likelihood
case control.
Abstract:: Survival models have been applied to time-to-event data for a long time, and usually a number of covariates are assumed to inﬂuence the distribution of the time to event through the model. The Cox proportional hazard model is commonly used in this context. To have a parsimonious model without losing consistency in estimation, several authors have extended the variable selection techniques of Fan and Li (2001) to survival settings. For example, the variable selection problem for the Cox model is studied in Fan and Li (2002). Recently, survival models like the Cox model are also extended to apply to dynamic network data (Vu et al., 2011b; Perry and Wolfe, 2013), where the observations are dependent. In this dissertation, we study the variable selection problem for a survival model other than the Cox model. In addition, we extend the variable selection work to the dynamic network model setting. We ﬁrst discuss the problem of variable selection for the proportional odds model, an alternative to Cox’s model, and show how to maximize the penalized proﬁle likelihood to estimate parameters and select variables simultaneously. Using a novel application of the semi-parametric theory developed by Murphy and Van der Vaart (2000), we derive asymptotic properties of the resulting estimators, including consistency results and the oracle property. In addition, we propose algorithms to maximize the penalized likelihood estimator based on a majorization-minimization (MM) algorithm. Tests on simulated and real data sets demonstrate that the newly proposed algorithm performs well in practice. Next, we extend the penalization idea to the Cox model in an egocentric approach to dynamic networks, and select covariates by maximizing the penalized partial likelihood function. Asymptotic properties of both the unpenalized and penalized partial likelihood estimates are developed under certain regularity conditions. We also implement the estimation and test the prediction performance of these estimates in a citation network. Since the covariates are time-varying, the computation cost is high. After variable selection, the model is reduced, which simplifies the calculation for future predictions. Another method to reduce the computational complexity is to use the case-control approximation, in which instead of using all the at-risk nodes in the network, only a subset is sampled to evaluate the partial likelihood function. By using this approximation, the computation time is shortened dramatically, while the prediction performance is still satisfactory in the citation network.

Tools