Toward Secure Deep Learning Systems

Open Access
- Author:
- Zhang, Xinyang
- Graduate Program:
- Informatics
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- February 24, 2021
- Committee Members:
- Ting Wang, Dissertation Advisor/Co-Advisor
Ting Wang, Committee Chair/Co-Chair
Xinyu Xing, Committee Member
Suhang Wang, Committee Member
Minghui Zhu, Outside Member
Neil Zhenqiang Gong, Special Member
Mary Beth Rosson, Program Head/Chair - Keywords:
- deep learning
computer security
attack
defense
privacy - Abstract:
- Machine learning (ML) and deep learning (DL) methods achieve state-of-art performances on various intelligence tasks, such as visual recognition and natural language processing. Yet, the technical committee overlooks the security threats against ML and DL systems. Due to the trends of deploying DL systems for online service and online infrastructure, these systems are faced more malicious attacks from adversaries. Therefore, it is urgent to understand the space of threats and propose solutions against them. In this dissertation, our study focus on the security and privacy of DL systems. Three common security threats against DL systems are adversarial examples, data poisoning and backdoor attacks, and privacy leakages. Adversarial examples are maliciously perturbed input that causes DL models to misbehave. To keep a system relying on DL models safe when deployed, the technical committee seeks methods to detect those adversarial examples or design robust DL models. In a data poisoning attack, adversary plants poisoned input in a target task’s training set. A DL classifier trained on this polluted dataset will misclassify the adversary’s target input. Backdoor attacks are an advanced variant of data poisoning attacks. The adversary poisons a DL model either by polluting its training data or modifying its parameters directly so that the poisoned model responds abnormally to inputs embedded with trigger patterns (e.g., patches or stickers in an image). DL developers need techniques to ensure training sets are clean and models used as components are unpolluted for these two types of attacks. The popularity of DL’s application also raises many privacy concerns. On the one hand, DL models are encoded with knowledge from a training set containing sensitive information from its contributor. It is critical to developing a method to prevent leakage of sensitive information from DL models. On the other hand, because high-performance DL models demand many training examples, multiple data owners may collectively train a model in an asynchronous and distributional manner. A proper private learning mechanism is necessary for this distributed learning to protect each party’s proprietary information. In this dissertation, we will present our contributions to understand DL systems’ security vulnerabilities and mitigate privacy concerns of DL systems. We first explore the interaction of model interpret-ability with adversarial examples. An interpretable deep learning system is built upon a classifier for classification and an interpreter for explaining the classifier’s decision. We show the additional model interpretation does not enhance the security of DL systems against adversarial examples. In particular, we develop ADV^2 attacks that simultaneously cause target classifier to misclassify the target input and induce a target interpretation map for the interpreter. Empirically studies demonstrate that our attack is effective on different DL models and datasets. We also provide an analysis of the root cause of the attack and potential counter-measures. We then present another two studies on data poisoning and backdoor attacks against DL systems. In the first work, we challenge the practice of fine-tuning pre-trained models for downstream tasks. Since state-of-arts DL models demand more and more computational resources to train, developers tend to build their models from third parties’ pre-trained models. We propose model-reuse attacks that directly modify a clean DL model’s parameters so that it misclassifies a target input when the poisoned is used for fine-tuning the target task. We keep the degradation in models’ performance on the pre-trained task during this attack negligible. We validate the effectiveness and easiness of model-reuse attacks with three different case studies. Similar to ADV^2 work, we explore the causes for this attack and discuss defenses against it. In the second work, we extend backdoor attacks to the natural language processing domain. Our Trojan^{LM} attacks poison pre-trained Transformer language models (LMs) so that after they are fine-tuned for an adversary’s target task, the final models misbehave when keywords defined by the adversary appear in the input sequence. Trojan^{LM} is evaluated under both supervised tasks and unsupervised tasks. We supply additional experiments with two approaches to defend against Trojan^{LM} attacks. We finally move to private ML and DL. We develop \propto MDL, a new multi-party DL paradigm. It is built upon three primitives: asynchronous optimization, lightweight homomorphic encryption, and threshold secret sharing. Through extensive empirical evaluation using benchmark datasets and deep learning architectures, we demonstrate the efficacy of $\propto$ MDL on supporting secure and private distributed DL among multiple parties. At the end of this dissertation, we highlight three future directions to explore the intersection of computer security and DL: defending adversarial examples in physical systems, discovering vulnerabilities in reinforcement learning, and applying machine learning to software security.