Detecting Policy Violators in Online Social Community an Extended Bayesian Belief Network Approach

Open Access
Huang, Shuo
Graduate Program:
Computer Science and Engineering
Master of Science
Document Type:
Master Thesis
Date of Defense:
June 22, 2012
Committee Members:
  • Sencun Zhu, Thesis Advisor
  • John Yen, Thesis Advisor
  • Anna Cinzia Squicciarini, Thesis Advisor
  • Social Network
  • Spam
  • Page Rank
  • Bayesian Network
  • Mutual Information
In this thesis, I have implemented solution for detecting policy violators in online social communities. Given the increasing number of users and traffic in online social services, e.g., forums, it is difficult for administrators to manually oversee the activities. My solution is designed to resolve such problems. To achieve the goal, this thesis implemented a risk warning system, using Bayesian networks (BN). BN is, firstly, a directed acyclic graph. Each node in the BN represents a hypothesis for user to have certain attributes. The arcs describe causal relationships between nodes. In this thesis, BN is designed using naïve-based classifier. In other words, it is assumed that all hypotheses are independent to each other. Secondly, the BN is also a statistical model. The data collected represents behavioral features about a user. These features, after processed by input nodes, become hypothesize of user. Input nodes are parent nodes of intermediate nodes. Each intermediate node represents an intermediate attribute of the user. Intermediate nodes are parents of core model nodes. Core model nodes model intent, opportunity and capability of the monitored user. These core model nodes are parents of the result node. For the result of BN, the result node produces a conditional probability. This value indicates a binary outcome of whether the user is malicious. Our solution includes a number of key techniques for handling data mining and processing. For example, Page Rank and Degree Centrality are implemented to monitor the popularity of the user. Sentimental Analysis, Topic Mining and Mutual Information are used for detecting the Authenticity and Relevance of user generated content. For test and evaluation, this thesis commandeers real world data of a top forum. The tests are designed to include fifty users with more than ten thousand posts per user as sample. To evaluate the performance of the BN, true positive and true negative ratios are carefully counted against human moderator. A true positive happens when BN successfully catches an abusive post. A true negative occurs when the post is malicious but the test result indicating otherwise. The test results show very high ratio of true positives and true negatives comparing to human moderator.