Trustworthy Machine Learning: Learning under Security, Explainability and Uncertainty Constraints

Open Access
- Author:
- Le, Thai Quang
- Graduate Program:
- Information Sciences and Technology
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- February 28, 2022
- Committee Members:
- Dongwon Lee, Chair & Dissertation Advisor
Suhang Wang, Major Field Member
Prasenjit Mitra, Major Field Member
S. Shyam Sundar, Outside Unit & Field Member
Mary Beth Rosson, Program Head/Chair - Keywords:
- machine learning
responsible AI
explainability
adversarial attack
adversarial defense
security
privacy
XAI
trustworthy - Abstract:
- Trustworthy machine learning models are ones that not only have high accuracy but also well perform under various realistic constraints, security threats, and are transparent to users. By satisfying these constraints, machine learning models can gain trust from their users and thus make it easier for them to be adopted in practice. This thesis makes contributions on three aspects of trustworthy machine learning, namely (i) learning under uncertainty--i.e., able to learn with limited and/or noisy data, (ii) transparent to the end-users--i.e., being explainable to the end-users, and (iii) secured and resilient machine learning--i.e., adversarial attacks and defense from/against malicious actors. Particularly, this thesis proposes to overcome the lack of high-quality labeled textual data that is necessary for training effective ML classification models by directly synthesizing them in the data space using generative neural networks. Moreover, this thesis designs a novel algorithm that facilitates accurate and effective post-hoc explanations of neural networks' predictions to the end-users. Furthermore, this thesis also demonstrates the vulnerability of a wide range of fake news detection models in the literature against a carefully designed adversarial attack mechanism where the attackers can promote fake news or demote real news on social media via social discourse. This thesis also proposes a novel approach that adapts the "honeypot" concept from cybersecurity to proactively defend against a strong universal trigger attack. Last but not least, this thesis contributes to the adversarial text literature by proposing to study, extract and utilize not machine-generated but realistic human-written perturbations online. Through these technical contributions, this thesis hopes to advance the adoption of ML systems in high-stakes fields where mutual trust between humans and machines is paramount.