Likelihood based inference in data privacy and other discrete missing data problems.
Open Access
- Author:
- Karwa, Vishesh
- Graduate Program:
- Statistics
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- June 23, 2014
- Committee Members:
- Aleksandra B Slavkovic, Dissertation Advisor/Co-Advisor
Debashis Ghosh, Committee Member
David Russell Hunter, Committee Member
Adam Smith, Special Member - Keywords:
- Missing Data
Likelihood
Privacy
networks
discrete data - Abstract:
- In this dissertation, we develop procedures for performing likelihood based inference in discrete data problems in the areas of data privacy, causal inference and ecological inference. In such problems, the data are missing and the conditional expectation of missing data given the observed data is intractable. In data privacy problems, we focus on private inference in social networks and contingency tables. In such applications the original data may entirely be missing due to a known privacy mechanism and we only observe a randomized (or noisy) version of the data. We consider mechanisms that satisfy a notion of privacy called differential privacy. We demonstrate that ignoring the privacy mechanism can lead to invalid inferences. Furthermore, we develop inference procedures for three classes of models that take the privacy mechanism into account. For exponential random graph models (ERGMs) with degree sequences as sufficient statistics, we develop a privacy preserving estimator that is asymptotically consistent. For more general ERGMs where the differentially private mechanism applies a version of randomized response technique, we develop a Markov chain Monte Carlo (MCMC) method for inference. Lastly, we develop Variational approximations to estimate parameters of decomposable log-linear models fitted to ``noisy" contingency tables and to perform classification. In some cases, the procedures developed apply to more general classes of problems with known missing data mechanisms. In problems related to ecological and causal inference, the missing data mechanism is not known and needs to be modeled. Using tools from algebraic statistics, we develop an MCMC framework that unifies inference for these special types of models.