Problems of Discrete Inference with Noisy or Incomplete Data
Open Access
Author:
Hehir, Jonathan
Graduate Program:
Statistics
Degree:
Doctor of Philosophy
Document Type:
Dissertation
Date of Defense:
February 13, 2023
Committee Members:
Bing Li, Professor in Charge/Director of Graduate Studies David Hunter, Major Field Member Xiaoyue Niu, Dissertation Advisor Aleksandra Slavkovic, Chair & Dissertation Advisor Daniel Kifer, Outside Unit & Field Member
Keywords:
differential privacy community detection spectral clustering distinct counting data sketching
Abstract:
This dissertation highlights a selection of modern problems involving statistical inference on discrete data. At the heart of each of these problems lies a computational challenge not adequately met by the classical or exact methods, motivating the use of fast, approximate methods. These fast approximations---data sketching for distinct counting and spectral clustering for network community detection---have been widely known and used for decades, yet a number of important gaps remain. In particular, we provide novel adaptations of these methods to three areas: distinct counting and community detection with provable guarantees of privacy, and community detection in networks whose community structure is determined by a mixture of observed and latent factors. The contributions of this work are both theoretical and practical. The proposed solutions to each of these problems is accompanied by rigorous proof of statistical consistency as well as empirical evaluation on both real and simulated data.