TweetBLM: A Hate Speech Dataset and Analysis of Black_Lives_Matter-Related Microblogs on Twitter

Open Access
- Author:
- Kumar, Sumit
- Graduate Program:
- Computer Science and Engineering
- Degree:
- Master of Science
- Document Type:
- Master Thesis
- Date of Defense:
- July 04, 2023
- Committee Members:
- Rui Zhang, Thesis Advisor/Co-Advisor
Abhinav Verma, Committee Member
Chitaranjan Das, Program Head/Chair - Keywords:
- BLM
Tweets
Twitter
Black Lives Matter
Model - Abstract:
- In recent years, the proliferation of toxic and hateful content on social media platforms has become a growing concern. With the emergence of movements like Black Lives Matter, there has been a surge in user-generated responses, both positive and negative, on the internet. In this research paper, we present a novel dataset called TweetBLM, specifically focusing on tweets related to the Black Lives Matter movement. The dataset consists of 9165 manually annotated tweets, categorized into two classes: "HATE" and "NON-HATE," based on the presence of content related to racism stemming from the movement for the black community. Our objective in this work goes beyond data collection; we also provide valuable statistical insights derived from the dataset. Additionally, we conducted a comprehensive analysis of various machine learning models, including Logistic Regression, Random Forest, CNN, LSTM, Bi-LSTM, Fasttext, BERT-base, and BERT-large, to address the classification task using our dataset. By employing these models, we aim to contribute to the ongoing efforts within the research community to identify and mitigate hate speech on the internet. The availability of our dataset to the public further encourages collaborative research endeavors and fosters the development of effective strategies for combating hate speech in online spaces. Through this research, we aspire to make a meaningful impact in creating a safer and more inclusive digital environment.