Nonparametric Independence Screening And Test-based Screening Via The Variance Of The Regression Function

Open Access
- Author:
- Song, Won Chul
- Graduate Program:
- Statistics
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- April 30, 2015
- Committee Members:
- Michael G Akritas, Dissertation Advisor/Co-Advisor
Runze Li, Committee Member
Dennis Kon Jin Lin, Committee Member
Jingzhi Huang, Special Member - Keywords:
- Sure independence screening
test-based screening
Ultrahigh dimensionality - Abstract:
- This dissertation develops procedures for screening variables, in ultrahigh-dimensional settings, based on their predictive significance. First, we review existing literature on the sure screening procedures for analyzing ultrahigh-dimensional data. Second, we develop a screening procedure by ranking the variables, according to the variance of their respective marginal regression functions (RV-SIS). This is in sharp contrast with existing literature on feature screening, which ranks the variables according to some correlation measures with the response, and hence select variables with no predictive power (e.g., variables that influence aspects of the conditional distribution of the response other than the regression function). The RV-SIS is easy to implement and does not require any model specification for the regression functions (such as linear or other semi-parametric modeling). We show that, under some mild technical conditions, the RV-SIS possesses a sure independence property, which is defined by Fan and Lv (2008). Numerical comparisons suggest that RV-SIS has competitive performance compared to other screening procedure and outperforms them in many different model settings. Third, we develop a test procedure for the hypothesis of a constant regression function, and also a test-based variable screening procedure. We study the asymptotic theory for the variance of the regression function and use it to introduce a new test procedure for testing the significance of a predictor. Using the set of p-values, we introduce a variable screening procedure with a specified desirable false discovery rate by using Benjamini and Hochberg (1995) approach.