Excursions in Causal Data Science: Fairness, Causal Attribution, and Applications
Open Access
- Author:
- Khademi, Aria
- Graduate Program:
- Informatics
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- June 04, 2021
- Committee Members:
- Daniel Susser, Major Field Member
Sarah Rajtmajer, Major Field Member
Vasant Honavar, Chair & Dissertation Advisor
Aleksandra Slavkovic, Outside Unit, Field & Minor Member
Mary Beth Rosson, Program Head/Chair - Keywords:
- Artificial Intelligence
Machine Learning
Causal Inference
Interpretable Machine Learning
Fairness
Health Care
Causal Attribution - Abstract:
- Machine learning (ML) is transforming data-driven discovery and decision making across many areas of human endeavor. High stakes applications of machine learning, e.g., scientific discovery, healthcare, business decisions, etc., require the predictive models trained using ML to be interpretable by humans, and in many cases, free of undesirable biases that could lead to unfair discrimination on the basis of gender, race, and other protected attributes. This dissertation examines the closely related problems of model interpretability and fairness through a causal lens. The main contributions of the dissertation can be summarized as follows: (i) We reformulate the problem of fairness in decision making to that of estimating the causal effect of protected attributes on outcomes. We offer two causality-grounded measures for assessing fairness and show how to measure fairness by effectively and reliably estimating these definitions from observational data, in the absence of randomized controlled trials, using the Rubin-Neyman potential outcomes framework. (ii) We reformulate the problem of explaining the predictions of black box predictive models trained using machine learning, e.g., deep neural networks, to that of elucidating the causal effects of the inputs of the predictive model on its outputs. We offer the first model agnostic causality-based approach to interpreting black box models without access to the internal structure and parameters of the model. Our proposed solution can in principle be applied to any given black box model. We show how to reliably interpret predictions of deep neural networks using our proposed approach. (iii) We apply the tools of causal inference to gain insights into the role of different factors in the spread of COVID-19. Our analyses show that commuting ties play an important role in both the spread of COVID-19, and the deaths resulting from it.