Analysis of Extremist and Terrorist Groups on Twitter

Open Access
- Author:
- Karimi, Younes
- Graduate Program:
- Informatics
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- June 06, 2024
- Committee Members:
- Dongwon Lee, Professor in Charge/Director of Graduate Studies
Sarah Rajtmajer, Major Field Member
Shomir Wilson, Co-Chair & Dissertation Advisor
Daniel Tavana, Outside Unit & Field Member
Anna Squicciarini, Co-Chair & Dissertation Advisor - Keywords:
- ISIS
Iran
Extremism
Terrorism
Twitter
X
Social Media
Large Language Model
LLM
User Classification
Propaganda
Islamic Regime
Authoritarian Regime
Islamic State
Extremist Groups
Dataset
Behavior Modeling
Social Network Mining
Image Classification - Abstract:
- The recent rise in social media use has many benefits, such as more connectivity and access to more diverse information, communities, and resources for demonstrating support for various causes. However, social media platforms have led to many unexpected negative phenomena: misinformation, disinformation, and cyberbullying. Worryingly, Islamist extremist groups like the Islamic State (ISIS) and authoritarian regimes such as the Islamic Regime in Iran (IR) have embraced social media to contact their target audience directly and recruit new members, undermine or threaten their rivals, and spread propaganda messages. My goal in this dissertation is to explore the Twitter activities of these two groups and propose automated approaches for identifying their supporters and propagandistic tweets. I investigate ISIS' social media strategies and characteristics by analyzing accounts linked with ISIS. I focus on these accounts' makeup, individual and social network attributes, locations, language and media use, and top hashtags, keywords, topics, and concepts in their messages. I use a dataset of 1.3 million tweets, including tweets from known ISIS users collected during 2014–2016, when these accounts were highly active, and build a model to identify ISIS users, reaching a 92.4% accuracy. I also compare the characteristics of ISIS with those who have retweeted or quoted their tweets by collecting a longitudinal dataset of 10 million tweets. Unsurprisingly, I find that 89% of retweeters and 73% of quoters are likely to be affiliated with ISIS while still not being suspended. I identify candidate ISIS propaganda messages by extracting ISIS tweets that have gained abnormally high engagement while being authored by users with a small follower network. I inspect these messages, their characteristics, and their authors to identify attributes that may have helped them gain traction. Through my investigations of a subset of the most popular ISIS tweets, I found preliminary evidence of strategic behavior among their users. They do not only talk about ISIS- and Islamic-related content, and their tweets can contain non-propagandistic content and discussions about football games, criticisms of Egypt's president, execution of Sunni preachers in Iran, or the former leader of Boko Haram. This could be part of their strategy to display normal behavior, reduce the focus from their radical content and connections to other ISIS users, and avoid bot and malicious activity detection efforts by Twitter. I discovered that over 10% of ISIS tweets posted in February 2014 are potentially propaganda. Additionally, among 544 ISIS users with candidate propaganda tweets, 346 users do not have any non-propaganda tweets in the dataset. After extracting named entities from ISIS tweets, their retweeters, quoters, and mentioners, I found that most of the retweeters' hashtags are religion-related. In contrast, the top hashtags mainly indicate some locations for the other three groups. Besides the textual data and metadata, I analyze photos attached to popular ISIS tweets and build an image classifier with 84.21% overall accuracy to categorize them into religion, military, and news automatically. The image classifier correctly labels 69% of the ISIS propaganda photos, which are majorly related to the news. Furthermore, I analyze social media strategies used by an Islamic authoritarian political entity, which shares notable similarities with ISIS. I have collected over 28 million tweets associated with the 2022 upheaval in Iran after the death of Mahsa Amini, consisting of tweets from (1) protesters and IR opponents and (2) regime supporters. Utilizing the followers' network collected from prominent accounts associated with these two political directions, I perform large-scale labeling of abundant users in the initial dataset and show that this network-based labeling achieves 98% accuracy. I employ the first dataset to fine-tune an existing BERT-based Large Language Model (LLM) for Persian (ParsBERT) and to distinguish tweets supporting or opposing the IR. I define multiple aggregation methods to combine these tweet-level classifications and generate a single label for each user using all or only a portion of their historical tweets. I evaluate and compare the performance of this user classifier with two other classifiers I built utilizing individual differentiating user profile attributes and hashtags and demonstrate how an ensemble classifier incorporating both of my LLM-based and hashtag-based classifiers can achieve an accuracy of 87% and perform better on detecting regime supporters. I also explore various propaganda techniques employed by the IR to influence the online community and manipulate public opinions and trends, and propose an automated approach that could potentially be used for detecting these techniques within tweets. I propose a label bootstrapping approach for manually labeling a subset of tweets to ensure enough likely propaganda samples are identified and selected. This method consists of an iterative active learning phase where new query (labeling) samples are chosen according to their semantic similarity to the previously identified propaganda messages. Then, multiple weak learners trained on labeled samples determine edgy samples that cannot be classified with enough certainty. Finally, I model and cluster the Twitter activities of the most active users and show that retweeting and quoting are the most frequent activities among users in this dataset. I further analyze the potential influence of anti- and pro-regime users by comparing the tweets and users for each side that have gained abnormal engagements and a sizably high ratio of growth in their followers. I illustrate that the frequency of anti-regime tweets and users with notable engagement is three orders of magnitude larger than pro-regime, and drastic follower growth is considerably more common among regime opposers, most of whom are not the most active users. I also identify users who have gained the highest numbers of retweets, likes, quotes, and replies and show that although 17% of the followers of the IR supreme leader (111,505 users) have created their accounts after the death of Mahsa, potentially to support the regime, amplify its voice, and spread its propaganda, IR supporters have relatively failed to gain comparable engagements.