Differentially Private Convex Optimization For Empirical Risk Minimization And High-dimensional Regression
Open Access
- Author:
- Guha Thakurta, Abhradeep
- Graduate Program:
- Computer Science and Engineering
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- November 13, 2012
- Committee Members:
- Adam Davison Smith, Dissertation Advisor/Co-Advisor
Adam Davison Smith, Committee Chair/Co-Chair
Daniel Kifer, Committee Member
Aleksandra B Slavkovic, Committee Member
Sofya Raskhodnikova, Committee Member - Keywords:
- Data Privacy
Differential Privacy
Machine Learning
High-dimensional Statistics
Sparse Regression - Abstract:
- Learning systems are the backbone of most web-scale advertisement and recommendation systems. Such systems rely on past inputs from users to decide on a particular advertisement or recommendation to be displayed to a new user. The way these learning systems work is by first recording past user responses (collectively called the \emph{training} data) and then learning a prediction model which decides on a relevant advertisement or recommendation for a new user. Often the training data sets contain sensitive information about users (e.g., sexual orientation or marital status). Recent research has shown that these large-scale learning systems can inadvertently leak sensitive information about individual users in the training data \cite{Korolova10,CalandrinoKNFS11}. This leads us to think about designing learning algorithms which do not leak ``too much'' information about any individual (user) in the training data. In this dissertation we design learning algorithms with rigorous privacy guarantees. We adhere to a formal well-accepted notion of privacy, called \emph{differential privacy} \cite{DMNS06}. Differential privacy guarantees that an algorithm's output does not depend too much on the data of any individual in the data set. This is crucial in fields that handle sensitive data, such as genomics, collaborative filtering, and economics. In our work we design differentially private algorithms for the following two sets of learning problems: i) convex optimization for empirical risk minimization (the most common use of convex optimization in machine learning), and ii) sparse regression (a broad class of high-dimensional problems). For both these problems, we design differentially private algorithms that are provably (almost) as accurate as the best non-private algorithm.