Likelihood based inference in data privacy and other discrete missing data problems.

Open Access
Author:
Karwa, Vishesh
Graduate Program:
Statistics
Degree:
Doctor of Philosophy
Document Type:
Dissertation
Date of Defense:
June 23, 2014
Committee Members:
  • Aleksandra B Slavkovic, Dissertation Advisor
  • Debashis Ghosh, Committee Member
  • David Russell Hunter, Committee Member
  • Adam Smith, Special Member
Keywords:
  • Missing Data
  • Likelihood
  • Privacy
  • networks
  • discrete data
Abstract:
In this dissertation, we develop procedures for performing likelihood based inference in discrete data problems in the areas of data privacy, causal inference and ecological inference. In such problems, the data are missing and the conditional expectation of missing data given the observed data is intractable. In data privacy problems, we focus on private inference in social networks and contingency tables. In such applications the original data may entirely be missing due to a known privacy mechanism and we only observe a randomized (or noisy) version of the data. We consider mechanisms that satisfy a notion of privacy called differential privacy. We demonstrate that ignoring the privacy mechanism can lead to invalid inferences. Furthermore, we develop inference procedures for three classes of models that take the privacy mechanism into account. For exponential random graph models (ERGMs) with degree sequences as sufficient statistics, we develop a privacy preserving estimator that is asymptotically consistent. For more general ERGMs where the differentially private mechanism applies a version of randomized response technique, we develop a Markov chain Monte Carlo (MCMC) method for inference. Lastly, we develop Variational approximations to estimate parameters of decomposable log-linear models fitted to ``noisy" contingency tables and to perform classification. In some cases, the procedures developed apply to more general classes of problems with known missing data mechanisms. In problems related to ecological and causal inference, the missing data mechanism is not known and needs to be modeled. Using tools from algebraic statistics, we develop an MCMC framework that unifies inference for these special types of models.