Enabling Easier Information Access In Online Discussion Forums

Open Access
Bhatia, Sumit
Graduate Program:
Computer Science and Engineering
Doctor of Philosophy
Document Type:
Date of Defense:
June 07, 2013
Committee Members:
  • Prasenjit Mitra, Dissertation Advisor
  • Wang Chien Lee, Committee Member
  • Daniel Kifer, Committee Member
  • Xiaofei Lu, Special Member
  • online discussion forums
  • web forums
  • query suggestion
  • discussion summarization
  • speech act
  • dialog act.
Online discussion forums have become popular in recent times. They provide a platform for people from different parts of the world sharing a common interest to come together and topics of mutual interest and seek solutions to their problems. There are hundreds of thousands of internet forums containing tens of millions of discussion threads and are thus, an important source of human generated information that needs to be efficiently managed. In this dissertation, I focus on following three specific problems: 1. Searching for relevant discussion threads in an online forum archive. A typical discussion thread is different from a generic web page in its structure, linking patterns, and creates content contributed by a large number of participating contributors. A probabilistic retrieval model is proposed that takes into account the structural properties, content properties, and various non-textual relevance indicators such as thread popularity, user expertise, etc. The proposed retrieval model achieved significant improvements over a standard language model based retrieval model and methods that are typically used in online forum websites. 2. Offering query suggestions in a forum search engine. Compared to a web search engine, a typical forum website receives much smaller number of search requests and hence the query log of a forum search engine is small. A probabilistic query suggestion mechanism is proposed that does not rely on query logs and can offer suggestions by computing completions from the forum corpus itself. Experimental results on two different datasets have shown that the proposed approach achieved statistically significant improvements over two state-of-the-art baseline query suggestion techniques. 3. Identifying the role of each user message in a discussion. Different messages in a thread serve different purpose in the discussion. I investigated the problem of classifying individual user posts in an online discussion thread and for post classification, I designed and experimented with a variety of features derived from the post’s content, thread structure, user behavior and sentiment analysis of the post’s text. Applications of post classifications are also demonstrated in forum thread retrieval and discussion summarization tasks.