Statistical Analysis of Missing Not at Random Problems with a Nonparametric Regression Model and Semiparametric Missingness Mechanism.
Restricted (Penn State Only)
- Author:
- Sudhakar Shetty, Samidha
- Graduate Program:
- Statistics
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- June 01, 2023
- Committee Members:
- Bing Li, Professor in Charge/Director of Graduate Studies
Yanyuan Ma, Chair & Dissertation Advisor
Xiaoyue Niu, Major Field Member
Rongling Wu, Outside Unit & Field Member
Bing Li, Major Field Member - Keywords:
- Efficient influence function model misspecification
nonignorable missing data
propensity of missing data
robust estimation
semiparametric statistics.
Nonignorable missing data
Semiparametric statistics
Efficient influence function
Robust estimation
Efficient estimation - Abstract:
- Missing data is common in data sets in every field of science. In the past few decades, there has been interest in understanding the underlying pattern of missingness, formally known as the missingness mechanism. There are three types of missingness mechanisms: Missing Completely at Random (MCAR), Missing at Random (MAR) and Missing Not at Random (MNAR). These can also be classified into two main categories: Ignorable (MCAR and MAR) and Nonignorable (MNAR). Most likelihood or imputation-based methods developed assume the ignorable condition, which is the more well studied condition. We discuss the nonignorable condition which is less well studied and also the hardest to deal with. This dissertation consists of three chapters that address the issue of estimation under the nonignorable missing data setting. In the first chapter, we propose a robust estimator of a parameter or a summary quantity of the model parameters in the context where outcome is subject to nonignorable missingness. These estimators are robust to misspecification of the dependence on covariates. The robustness of the estimators are nonstandard and are established rigorously through theoretical derivations, and are supported by simulations and a data application. In the second chapter, we attempt the efficient estimation of a function of the response under nonignorable missingness. We briefly discuss efficiency and robustness of estimators under the ignorable missingness assumption which is well established. However, efficiency under the nonignorable setting requires more investigation. We derive the efficient score for a function of the response but it turns out to be very complex and infeasible. Therefore, we recommend trading efficiency in favor of feasibility and using an inefficient but consistent estimator. In the final chapter, we propose an efficient estimator for the parameter involved in the missingness propensity. We first estimate the dependence of the missingness on the covariates. We incorporate the above estimator to construct an efficient estimator for the parameter of interest. We study the theoretical properties of this estimator and also put forward an alternative estimator for the mean of the response.