Causal Discovery from Relational Data: Theory and Practice

Restricted (Penn State Only)
Author:
Lee, Sanghack
Graduate Program:
Information Sciences and Technology
Degree:
Doctor of Philosophy
Document Type:
Dissertation
Date of Defense:
January 22, 2018
Committee Members:
  • Vasant Gajanan Honavar, Dissertation Advisor
  • Vasant Gajanan Honavar, Committee Chair
  • Clyde Lee Giles, Committee Member
  • John Yen, Committee Member
  • Bharath Kumar Sriperumbudur, Outside Member
Keywords:
  • Causality
  • Causal Model
  • Graphical Models
  • Relational Data
  • Relational Model
Abstract:
Discovery of causal relationships from observational and experimental data is a central problem with applications across multiple areas of scientific endeavor. There has been considerable progress over the past decades on algorithms for eliciting causal relationships through a set of conditional independence queries from data. Much of this work assumes that the data instances are independent and identically distributed (iid). However, in many real-world applications, because the underlying data exhibits a relational structure of the sort that is modeled in practice by an entity-relationship model, the iid assumption is violated. Motivated by the limitations of traditional approaches to learning causal relationships from relational data, a relational causal model is recently introduced. The key idea behind the relational causal model is that a cause and its effects are in a direct or indirect relationship that is reflected in the relational data. Traditional approaches for reasoning with and learning causal models from iid data cannot be trivially applied in the relational setting. Against this background, this dissertation investigates a set of closely related research problems having to do with causal inference with relational data: (i) characterizing the conditional independence relations that hold in a given relational causal model, (ii) sound and complete learning of the structure of a relational causal model using an independence oracle, (iii) measuring the strength of conditional dependence and testing conditional independence among relational variables from relational data, and (iv) robustly learning the structure of a relational causal model from relational data.