PENALIZED QUADRATIC INFERENCE FUNCTIONS FOR VARIABLE SELECTION IN LONGITUDINAL RESEARCH

Open Access
Author:
Dziak, John Joseph
Graduate Program:
Statistics
Degree:
Doctor of Philosophy
Document Type:
Dissertation
Date of Defense:
June 15, 2006
Committee Members:
  • Runze Li, Committee Chair
  • Naomi S Altman, Committee Member
  • Bing Li, Committee Member
  • Linda Marie Collins, Committee Member
Keywords:
  • SCAD
  • LASSO
  • QIF
  • GEE
  • generalized estimating equations
  • variable selection
Abstract:
For decades, much research has been devoted to developing and comparing variable selection methods, but primarily for the classical case of independent observations. Existing variable-selection methods can be adapted to cluster-correlated observations, but some adaptation is required. For example, classical model fit statistics such as AIC and BIC are undefined if the likelihood function is unknown (Pan, 2001). Little research has been done on variable selection for generalized estimating equations (GEE, Liang and Zeger, 1986) and similar correlated data approaches. This thesis will review existing work on model selection for GEE and propose new model selection options for GEE, as well as for a more sophisticated marginal modeling approach based on quadratic inference functions (QIF, Qu, Lindsay, and Li, 2000), which has better asymptotic properties than classic GEE. The focus is on selection using continuous penalties such as LASSO (Tibshirani, 1996) or SCAD (Fan and Li, 2001) rather than the older discrete penalties such as AIC and BIC. The asymptotic normality and efficiency (in the sense of the oracle property) of SCAD are demonstrated for penalized GEE and for penalized QIF, with the SCAD and similar penalties. This is demonstrated both in a fixed-dimensional and a growingdimensional scenario.