Convolutional Neural Network and Question Generation Based Approaches to Select Best Answers for Non-Factoid Questions

Open Access
- Author:
- Srinath, Mukund
- Graduate Program:
- Information Sciences and Technology
- Degree:
- Master of Science
- Document Type:
- Master Thesis
- Date of Defense:
- March 19, 2019
- Committee Members:
- Dongwon Lee, Thesis Advisor/Co-Advisor
- Keywords:
- answer selection
question generation
NLP - Abstract:
- The answer selection task involves selecting the most appropriate answer for a question given a list of answers for the question. The problem tackled in this thesis is a subset of the answer selection task and concentrates on answer selection for non-factoid questions. Non-factoid questions are ones which cannot be answered in a word or phrase. They usually have long answers which do not share a lot of common words with the question. Two methods are discussed in this thesis. First is a Convolutional Neural Network method which creates distributed vector representations for the questions and answers, and learns to minimize the distance (in vector space) between questions and their most-appropriate answers. Second is a question generation approach in which a Seq2Seq model is trained to generate questions from the given answers and the previously discussed CNN is then used to create vector representations of the questions minimizing the distance between the true question and the question generated by the true answer. Answer selection is treated as an information retrieval task and the precision@1 and mean reciprocal rank scores are reported. Evaluation is carried out on two datasets, the Yahoo Webscope L6 which is a standard dataset for the answer selection task and the Library corpus - a custom dataset created by collecting student responses to information literacy questions to earn online micro-credentials. The performance of the CNN model shows improvement in precision@1 scores over state of the art models on the library corpus and shows comparable performance on the Yahoo Answers corpus. The results obtained using the question generation approach are promising and suggest steps for future work.