Defense of Backdoor Attacks against Deep Neural Network Classifiers

Open Access
- Author:
- Xiang, Zhen
- Graduate Program:
- Electrical Engineering
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- May 03, 2022
- Committee Members:
- George Kesidis, Major Field Member
Constantino Lagoa, Major Field Member
David Miller, Chair & Dissertation Advisor
Anna Squicciarini, Outside Unit & Field Member
Kultegin Aydin, Program Head/Chair - Keywords:
- adversarial machine learning
backdoor attack
Trojan
backdoor defense
anomaly detection
point cloud
reverse-engineering - Abstract:
- Deep neural network classifiers (DNNs) are increasingly used in many applications, including security-sensitive ones, but they are vulnerable to adversarial attacks. An emerging type of backdoor attack aims to induce test samples from one or more source classes to be misclassified to a target class, whenever a backdoor pattern is present. A backdoor attack can be easily launched by poisoning the DNN’s training set with a small set of samples originally from the source classes, embedded with the same backdoor pattern that will be used at test time, and labeled to the target class. A successful backdoor attack will not degrade the accuracy of the DNN on clean, backdoor-free test samples; thus are stealthy and undetectable using (e.g.) validation set accuracy. Defending backdoor attacks is very challenging due to the practical constraints associated with the defense scenario. Backdoor defenses deployed during the training phase aim to detect if the training set is poisoned or not; if there is poisoning, the samples with the backdoor pattern should be identified and removed before training. For this defense scenario, there is no subset of training samples guaranteed to be clean that can be used for reference. Backdoor defenses deployed post-training aim to detect if a pre-trained DNN is backdoor attacked or not. For this defense scenario, the defender is assumed not to have access to the DNN’s training set or to any samples embedded with the backdoor pattern used by the attack, if there is actually an attack. Backdoor defenses deployed during a DNN’s inference phase aim to detect if a test sample is embedded with a backdoor pattern. For this scenario, the defender does not know a priori the backdoor pattern used by the attacker, and has to make immediate detection inferences for each test sample. In this thesis, we mainly focus on the image domain (like most related works) and propose several backdoor defenses deployed during-training and post-training. For the most challenging post-training defense scenario, we first propose a reverse-engineering defense (RED) which requires neither access to the DNN’s training set nor to any clean classifiers for reference. Then, we propose a Lagrange-based RED (L-RED) to improve the time and data efficiency of RED. Moreover, we propose a maximum achievable misclassification fraction (MAMF) statistic to address the challenge of reverse-engineering a very common type of patch replacement backdoor pattern; and an expected transferability (ET) statistic to address two-class, multi-attack scenarios where typical anomaly detection approaches of REDs are not applicable. For the before/during training defense scenario, we first propose a clustering-based approach with a cluster impurity (CI) statistic to distinguish training samples with the backdoor pattern from clean target class samples. We also propose a defense inspired by REDs (for the post-training scenario) which not only identify training samples with the backdoor pattern, but also “restore” these samples by removing a reverse-engineered backdoor pattern. While backdoor attacks and defenses have been extensively investigated for images, we extend these studies to domains other than images. In particular, we devise the first backdoor attack against point cloud classifiers (dubbed “point could backdoor attack” (PCBA)) – PC classifiers play important roles in applications like autonomous driving. We also extend our RED for images to defend against such PCBAs by leveraging the properties of common point cloud classifiers. In summary, we provide solutions to practical users to protect their devices/systems/applications that involve DNNs from backdoor attacks. Our works also provide insights to the machine learning community on the effect of training set deviation, feature reverse-engineering, and neuron functional allocation; moreover, the empirical evaluation protocols adopted in this thesis can potentially be a reference for establishing a standard for measuring the security level of DNNs against backdoor attacks.