Hypothesis Testing and Variable Selection in Nonparametric Regression

Open Access
Author:
Zambom, Adriano Zanin
Graduate Program:
Statistics
Degree:
Doctor of Philosophy
Document Type:
Dissertation
Date of Defense:
April 11, 2012
Committee Members:
  • Michael G Akritas, Dissertation Advisor
  • Michael G Arkitas, Committee Chair
  • Runze Li, Committee Member
  • Bing Li, Committee Member
  • Adam Smith, Special Member
Keywords:
  • Hypothesis Testing
  • Variable Selection
  • Backward Elimination
  • Nonparametric Regression
  • False Discovery Rate
  • ANOVA-type tests
Abstract:
Let X be a d dimensional vector of covariates and Y be the response variable. Under the nonparametric model Y = m(X) + σ(X)ε we develop an ANOVA-type test for the null hypothesis that a particular coordinate of X has no influence on the regression function. The asymptotic distribution of the test statistic, using residuals based on Nadaraya-Watson type kernel estimator is established under the null hypothesis and local alternatives. When using local polynomial regression, it is shown that the theorem holds for higher dimensions under some smooth assumptions. Simulations show that the proposed procedure outperforms existing methods. Moreover, additional simulations suggest that under a sparse model, the applicability of the test extends to arbitrary d through sufficient dimension reduc- tion. Using p-values from this test, a variable selection method based on multiple testing ideas is proposed. Simulations reveal that the proposed variable selection method performs competitively against well established procedures. A real data set is analyzed as an application of the variable selection. The intuitive extension of the test statistic for testing the significance of more than one covariate at a time is developed, and its asymptotic normality is stablished. We investigate the power of the this test under different scenarios, including linear and non-linear regression and logistic regression. There are many situations where the covariates appear in groups, and the selection of the significant groups under the nonparametric regression model is of interest. We propose a group variable selection procedure based on multiple testing ideas. Simulations suggest that under a nonparametric or non- additive model the proposed procedure outperforms linear based group variable selections methods. We also use the ANOVA-type methodology to introduce a test statistic for the additivity of the regression for the homocedastic case.