AUTHORING CONTENT AUTOMATICALLY USING ABSTRACTIVE SUMMARIZATION

Open Access
Author:
Banerjee, Siddhartha
Graduate Program:
Information Sciences and Technology
Degree:
Doctor of Philosophy
Document Type:
Dissertation
Date of Defense:
December 09, 2016
Committee Members:
  • Prasenjit Mitra, Dissertation Advisor
  • Prasenjit Mitra, Committee Chair
  • Xiaofei Lu, Committee Member
  • John Yen, Committee Member
  • David Reitter, Outside Member
Keywords:
  • Abstractive summarization
  • Automatic authoring
  • Text summarization
  • Text-to-text generation
Abstract:
Automatic Text summarization has emerged as a very popular area of research due to the availability of enormous amount of data from multiple sources such as news articles, social media and online forums. Primarily, summarization has been classified broadly into two major categories – Extractive and Abstractive. Extractive summarizers identify the most important sentences from the input, which can be a single document or a set of documents. By contrast, abstractive methods aim to “understand” the entire text, interpreting the facts and re-telling the text in fewer words. Research on abstractive summarization has been limited due to the inherent difficulty associated with generating abstractive summaries, for example, text understanding, grammatical text generation, etc. Abstractive summarization that rely on template-based methods and language realization techniques are not exhaustive across domains and are generally costly due to manual construction of rules. In this dissertation, first, I describe several data- driven text-to-text generation techniques for abstractive summarization that can generate informative and readable summaries from input documents such as news articles and meeting transcripts. My proposed techniques rely on syntactic analysis of the text to fuse information from multiple textual components followed by an optimization technique to select the most important elements from the input and simultaneously optimizing linguistic quality of the summaries. Experimental results demonstrate that the proposed summarization models significantly outperform various state-of-the-art models. Second, I describe an end-to-end application of abstractive summarization to author encyclopedic articles. Content from the web is retrieved and assigned to appropriate topical sections in the articles using a text classifier followed by abstractive summarization of the content assigned to the sections. Experimental results of both automatic and manual evaluation techniques demonstrate that my proposed approaches are more effective than other existing techniques for Wikipedia article generation.