AUTOMATIC TEXT-BASED EXPLANATION OF EVENTS

Debnath, Sandip

AUTOMATIC TEXT-BASED EXPLANATION OF EVENTS

Open Access

Author:: Debnath, Sandip
Graduate Program:: Computer Science and Engineering
Degree:: Doctor of Philosophy
Document Type:: Dissertation
Date of Defense:: August 22, 2005
Committee Members:: Arun Upneja, Committee Member
Tracy Mullen, Committee Member
Hongyuan Zha, Committee Member
Prasenjit Mitra, Committee Chair/Co-Chair
C Lee Giles, Committee Chair/Co-Chair
Keywords:: Text-based explanation
sentence-based explanation
keyword-based explanation
Abstract:: With the abundance of publicly available information on the Web reflecting the ever-changing nature of world events, one question comes to mind -- how can we explain a specific event using this vast amount of information? We are all aware that there a huge information explosion occurred with the Internet publishing during the last decade. This information source is enormous in size, and freely available but its unstructured nature, and the amount of inherent noise demand sophisticated techniques to understand of the reasons behind any event. Modern day search engines are too general in nature. They only try to find the document-set which has the query keyword(s) (with binray rules like ``AND', ``OR' etc. or exact matches) and supposedly most relevant for that query keyword(s). But this method has an inherent limitation. In the end they are just query matching engines. They do not have and do not require an understanding of the domain knowledge and specialized document processing techniques to provide a deeper perspective of any event. By the word ``event' we mean a particular occurrence with which we can associate a time. For example, ``the IBM stock price went up in the third week of June, 2004', or ``More than 1.5 million people lost electricity in Florida in September 2004' are events which happened and with which we can associate a sense of time, and thus situate them temporally with others. As we open our newspapers we see millions of descriptions of such ``events'. Actually newspapers and news articles are basically a list of events. These news articles are a huge information source but are unstructured compared to structural information source like a regular relational database. Due to these properties it is hard to get an understanding of and sense out of them. In this thesis, I propose a novel assembly of techniques which can be applied together on an unstructured information source (in the form of text-based news articles, reports, and other text-based documents). These techniques, applied in a step-by-step fashion, will permit proper analysis of the information source and can provide text-based explanations for major events that have occurred during a given time-frame. Given a noisy, unstructured, weekly-organized source of information, our method can reduce noise with high precision, sort them (according to specified attributes) with high accuracy, find relevance with the domain of concern with high precision, and at the end can show a text-based explanation (in the form of kewords, phrases or even sentences) for a particular event.

Tools