Methods for the Study of Public Opinion using Social Media Data with Applications to the German Refugee Crisis

Restricted (Penn State Only)
Author:
Linder, Fridolin J
Graduate Program:
Political Science
Degree:
Doctor of Philosophy
Document Type:
Dissertation
Date of Defense:
July 03, 2018
Committee Members:
  • Bruce A Desmarais Jr., Dissertation Advisor
  • Bruce A Desmarais Jr., Committee Chair
  • Matthew Richard Golder, Committee Member
  • Burt Monroe III, Committee Member
  • Alan Maceachren, Outside Member
  • Pablo Barberá, Special Member
Keywords:
  • Social Media
  • Public Opinion
  • Information Retrieval
  • Refugees
Abstract:
Social media platforms such as Twitter and Facebook gained increasing relevance in the public political discourse in recent years and allow access to large amounts of observational data on citizens public expression of political opinions. In this dissertation, I develop methods for improving data collection from social media and use Twitter data to study public reactions to the 2015-2016 refugee crisis in Germany. The first chapter addresses the problem of identifying small subsets of relevant documents—for example, tweets—in large databases. I propose a methodological approach combining query expansion and active machine learning to address the problem of identifying these extremely sparse relevant subpopulations. In the second chapter, I propose a research design for the detection of opinion shifts in reaction to local events based on Twitter data. Specifically, I study the reactions of German Twitter users to the allocation of refugees in their geographic proximity. I find that a reaction is detectable and that Twitter users show increased engagement with the topic before the actual allocation event. In the third chapter, I investigate if attacks on refugees and refugee facilities can be forecast, and what predictors produce the best performance for this task. Relying on a rich set of predictors including structural variables, time series dynamics, social media data and weather data, I compare the predictive performance on a county-day level of several theoretical models using state of the art machine learning techniques. I find that, forecasting models offer a 10-fold improvement over the baseline. In line with previous contributions to political science, the strongest predictor seems to be dynamics of past attacks themselves.