PENALIZED QUADRATIC INFERENCE FUNCTIONS FOR VARIABLE SELECTION IN LONGITUDINAL RESEARCH

Open Access
- Author:
- Dziak, John Joseph
- Graduate Program:
- Statistics
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- June 15, 2006
- Committee Members:
- Runze Li, Committee Chair/Co-Chair
Naomi S Altman, Committee Member
Bing Li, Committee Member
Linda Marie Collins, Committee Member - Keywords:
- SCAD
LASSO
QIF
GEE
generalized estimating equations
variable selection - Abstract:
- For decades, much research has been devoted to developing and comparing variable selection methods, but primarily for the classical case of independent observations. Existing variable-selection methods can be adapted to cluster-correlated observations, but some adaptation is required. For example, classical model fit statistics such as AIC and BIC are undefined if the likelihood function is unknown (Pan, 2001). Little research has been done on variable selection for generalized estimating equations (GEE, Liang and Zeger, 1986) and similar correlated data approaches. This thesis will review existing work on model selection for GEE and propose new model selection options for GEE, as well as for a more sophisticated marginal modeling approach based on quadratic inference functions (QIF, Qu, Lindsay, and Li, 2000), which has better asymptotic properties than classic GEE. The focus is on selection using continuous penalties such as LASSO (Tibshirani, 1996) or SCAD (Fan and Li, 2001) rather than the older discrete penalties such as AIC and BIC. The asymptotic normality and efficiency (in the sense of the oracle property) of SCAD are demonstrated for penalized GEE and for penalized QIF, with the SCAD and similar penalties. This is demonstrated both in a fixed-dimensional and a growingdimensional scenario.