Society and Bias: Uncovering Automated Prejudices in Sociotechnical Natural Language Processing Systems

Open Access
- Author:
- Narayanan Venkit, Pranav
- Graduate Program:
- Informatics
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- February 07, 2025
- Committee Members:
- Carleen Maitland, Program Head/Chair
Dongwon Lee, Major Field Member
Shomir Wilson, Chair & Dissertation Advisor
Amulya Yadav, Major Field Member
Rebecca Passonneau, Outside Unit & Field Member - Keywords:
- Natural Language Processing
Ethics in AI
Social Informatics
Algorithmic Fairness
Bias and Harms in NLP
Human Language Technology
Language Model Evaluation
Responsible AI - Abstract:
- As artificial intelligence expands into diverse sectors like finance and healthcare, AI systems increasingly shape our social interactions. However, these systems often perpetuate human-like biases, particularly in natural language processing (NLP) applications. While existing research has examined biases related to race and gender, there is a significant motivation for a more comprehensive approach to understand and address biases across all sociodemographic groups. This thesis, therefore, investigates biases in human language technologies through three complementary facets: I. Facet MODEL: Examines sociodemographic biases related to disability and nationality across NLP frameworks, including sentiment analysis models, word embeddings, and large language models. This technical analysis quantifies biases using sentiment scores and word vector distances while exploring mitigation strategies and building upon existing bias identification coverage II. Facet GAP: Analyzes the disconnect between AI researchers and society's understanding of key concepts like 'bias,' 'emotions,' 'sentiment,' and 'hallucination.' Through an interdisciplinary lens combining social informatics, philosophy, and AI, this work demonstrates how redefined technical terminology can lead to harmful biases in real-world applications. The analysis culminates in an ethics sheet for developing socially sensitive technologies. III. Facet SOCIAL: Uses actor-network theory to study the interaction between human and AI actors in sociotechnical systems. Through human subject studies, this facet examines how to redefine biases and understand how they manifest and evolve into systemic harms while proposing collaborative development frameworks to create more inclusive solutions. This research advances both AI and NLP fields by (1) providing systematic approaches to measure biases across understudied demographic categories, (2) developing an ethics framework for context-aware language model development, and (3) offering methodologies to evaluate AI's holistic social impact. Together, this thesis intends to support the development of more equitable language technologies that better serve all communities, using a sociotechnical lens.