Nonparametric Independence Screening And Test-based Screening Via The Variance Of The Regression Function

Open Access
Song, Won Chul
Graduate Program:
Doctor of Philosophy
Document Type:
Date of Defense:
April 30, 2015
Committee Members:
  • Michael G Akritas, Dissertation Advisor
  • Runze Li, Committee Member
  • Dennis Kon Jin Lin, Committee Member
  • Jingzhi Huang, Special Member
  • Sure independence screening
  • test-based screening
  • Ultrahigh dimensionality
This dissertation develops procedures for screening variables, in ultrahigh-dimensional settings, based on their predictive significance. First, we review existing literature on the sure screening procedures for analyzing ultrahigh-dimensional data. Second, we develop a screening procedure by ranking the variables, according to the variance of their respective marginal regression functions (RV-SIS). This is in sharp contrast with existing literature on feature screening, which ranks the variables according to some correlation measures with the response, and hence select variables with no predictive power (e.g., variables that influence aspects of the conditional distribution of the response other than the regression function). The RV-SIS is easy to implement and does not require any model specification for the regression functions (such as linear or other semi-parametric modeling). We show that, under some mild technical conditions, the RV-SIS possesses a sure independence property, which is defined by Fan and Lv (2008). Numerical comparisons suggest that RV-SIS has competitive performance compared to other screening procedure and outperforms them in many different model settings. Third, we develop a test procedure for the hypothesis of a constant regression function, and also a test-based variable screening procedure. We study the asymptotic theory for the variance of the regression function and use it to introduce a new test procedure for testing the significance of a predictor. Using the set of p-values, we introduce a variable screening procedure with a specified desirable false discovery rate by using Benjamini and Hochberg (1995) approach.