DETECTING OFFENSIVE LANGUAGE IN SOCIAL MEDIAS FOR PROTECTION OF ADOLESCENT ONLINE SAFETY

Open Access
Author:
Chen, Ying
Graduate Program:
Computer Science and Engineering
Degree:
Master of Science
Document Type:
Master Thesis
Date of Defense:
November 08, 2011
Committee Members:
  • Sencun Zhu, Thesis Advisor
  • Heng Xu, Thesis Advisor
Keywords:
  • offensive language
  • children online safety
  • social media
Abstract:
Currently adolescents highly rely on social media to interact with other people. Given the complicated environment of social media, it has become very difficult for adolescents to avoid encountering offensive content from time to time. Since the textual content on online social media is highly unstructured, informal, and often misspelled, existing research on message-level offensive language detection cannot accurately detect offensive content, and user-level offensiveness evaluation is still an underresearched area. To bridge this gap, we propose Lexical Syntactic Feature (LSF) architecture to detect offensive content and identify potential offensive user in social media. We distinguish the contribution of pejoratives/profanities and obscenities in determining offensive content, and introduce hand-authoring syntactic rules in identifying name-calling harassment. In particular, we incorporate users’ writing style, structure and specific cyberbullying content as features to predict users’ potentiality to send out offensive content. Results from experiments showed that LSF framework achieved significantly better performance than existing methods in offensive content detection. It categorizes 94.34% of offensive sentences and 98.24% of non-offensive sentences, and 90.2% of offensive users and 86.3% of non-offensive users. Meanwhile, processing speed of LSF is approximately 10msec per sentence, suggesting the potential for effective deployment on online social media. We believe such language processing model will greatly help to online offensive language monitoring, eventually to build a better online environment.