Recovering Ground Truth Rankings When The Majority Is Misinformed
Restricted (Penn State Only)
- Author:
- Puhan, Amrit
- Graduate Program:
- Informatics
- Degree:
- Master of Science
- Document Type:
- Master Thesis
- Date of Defense:
- June 26, 2024
- Committee Members:
- Hadi Hosseini, Thesis Advisor/Co-Advisor
Justin Silverman, Committee Member
Dongwon Lee, Professor in Charge/Director of Graduate Studies
Luke Zhang, Committee Member - Keywords:
- Computational Social Choice
AI
Voting
Bayesian Inference
Multi-agent systems
Statistical Modeling
Concentration Analysis - Abstract:
- We consider the problem of recovering the ground truth ordering (ranking, approval- k, and top-k) over a large number of alternatives through voting. While democratic voting addresses this problem through the wisdom of the crowd, this approach fails when the majority of the crowd is incorrect or misinformed. The surprisingly popular (SP) algorithm is an alternative to recover ground truth even when experts are in the minority. The SP algorithm uses meta-information of voters by requiring them to predict other voters’ preferences in the form of a full probability distribution over all rankings of alternatives. However, when the number of alternatives, m, is large, eliciting the prediction report or even voting over m alternatives becomes computationally and cognitively complex. To address this, we design a scalable alternative of the SP algorithm that simplifies information elicitation by requiring only partial preferences from the voters. We propose two algorithms—Aggregated-SP and Partial-SP—that ask voters to report vote and prediction on a subset of size k (≪ m) in terms of top alternative, partial rank, or an approval set. Through a large-scale crowdsourcing experiment on MTurk, we show that both of our algorithms outperform conventional preference aggregation algorithms for the recovery of ground truth rankings. Additionally, we propose a concentric mixture of Mallows model to accurately simulate voter behavior. Subsequently, we assess the similarity between the MTurk data and simulated data using No-U-Turn sampling, which is an extension of Hamiltonian Monte Carlo. We prove theoretically that for a reduced subset size k (≪ m), the sample complexity of our proposed algorithms increases by a factor of k! in contrast to the traditional m!, thus offering a more efficient solution. We conclude by exploring novel applications of our methodologies in generating preference datasets, specifically for Reinforcement Learning with Human/AI Feedback pipelines, aimed at improving instruction fine-tuning of large language models.