Attacking Anonymization and Constructing Improved Differentially Private Classifiers

Open Access
Mothali, Chandrasekhar Venkata
Graduate Program:
Electrical Engineering
Master of Science
Document Type:
Master Thesis
Date of Defense:
December 06, 2010
Committee Members:
  • Daniel Kifer, Thesis Advisor
  • multi-variate Laplacian estimators
  • data privacy
  • anatomy
  • sanitization
  • differential privacy
  • James-Stein
Privacy is an important issue when publishing data that involves individuals' sensitive information. As a consequence, Data sanitization which is the process of removing such sensitive information, gained a lot of importance. This thesis addresses issues concerning two sanitization schemes, namely anatomy and differential privacy. A framework is developed for the purpose of attacking Anatomy sanitization scheme. This is done by trying to infer the sensitive values of individuals in data sets protected using Anatomy. A method for obtaining well calibrated probabilities of the sensitive value assignments is also proposed. Compared to the previous attacks, this attack algorithm is much simpler, quicker and also gives comparable accuracy levels. Any sanitization scheme basically destroys some information present in the data set in the process of protecting the sensitive data. This reduces the utility or usefulness of the data set. The utility of differentially private statistics is affected due to the noise added in the process of sanitization. We present a novel empirical Bayes' estimate for such statistics which improves its utility. The statistics are then used to construct a differentially private classifier. We observe that the classifiers' accuracy is enhanced using the proposed estimate. What makes this technique unique is that the method for improving utility does not use any kind of auxiliary information.