DETECTING OFFENSIVE LANGUAGE IN SOCIAL MEDIAS FOR PROTECTION OF ADOLESCENT ONLINE SAFETY
Open Access
- Author:
- Chen, Ying
- Graduate Program:
- Computer Science and Engineering
- Degree:
- Master of Science
- Document Type:
- Master Thesis
- Date of Defense:
- November 08, 2011
- Committee Members:
- Sencun Zhu, Thesis Advisor/Co-Advisor
Sencun Zhu, Thesis Advisor/Co-Advisor
Heng Xu, Thesis Advisor/Co-Advisor - Keywords:
- offensive language
children online safety
social media - Abstract:
- Currently adolescents highly rely on social media to interact with other people. Given the complicated environment of social media, it has become very difficult for adolescents to avoid encountering offensive content from time to time. Since the textual content on online social media is highly unstructured, informal, and often misspelled, existing research on message-level offensive language detection cannot accurately detect offensive content, and user-level offensiveness evaluation is still an underresearched area. To bridge this gap, we propose Lexical Syntactic Feature (LSF) architecture to detect offensive content and identify potential offensive user in social media. We distinguish the contribution of pejoratives/profanities and obscenities in determining offensive content, and introduce hand-authoring syntactic rules in identifying name-calling harassment. In particular, we incorporate users’ writing style, structure and specific cyberbullying content as features to predict users’ potentiality to send out offensive content. Results from experiments showed that LSF framework achieved significantly better performance than existing methods in offensive content detection. It categorizes 94.34% of offensive sentences and 98.24% of non-offensive sentences, and 90.2% of offensive users and 86.3% of non-offensive users. Meanwhile, processing speed of LSF is approximately 10msec per sentence, suggesting the potential for effective deployment on online social media. We believe such language processing model will greatly help to online offensive language monitoring, eventually to build a better online environment.