Adaptable Conversational Chatbot Based on Reinforcement Learning
Restricted (Penn State Only)
- Author:
- George, Haruka
- Graduate Program:
- Data Analytics
- Degree:
- Master of Science
- Document Type:
- Master Thesis
- Date of Defense:
- March 17, 2023
- Committee Members:
- Raghu Sangwan, Program Head/Chair
Youakim Badr, Thesis Advisor/Co-Advisor
Adrian Sorin Barb, Committee Member - Keywords:
- Reinforcement Learning
Chatbot
Seq2seq
LSTM
Deep Reinforcement Learning
NLP - Abstract:
- A chatbot, defined in the dictionary as “computer program designed to simulate conversation with human users, especially over the Internet” has become an essential part of most sectors such as the educational sector, as well as commercial sector. Chatbot aims to deliver human like responses to the user, while providing meaningful and coherent responses. The Seq2seq encoder-decoder based chatbot models are currently some of the most cutting edge and state-of-the-art models, however, they still produce repetitive responses and are highly data intensive. Therefore, it is necessary to build a chatbot model which retains the advantages of the Seq2seq encoder-decoder based models while improving upon current drawbacks. Several current chatbot models do not consider the profile of the user to modify or adapt the response provided. In several situations, such as an academic setting, the profile of the user plays an important role in determining the level of response expected. This research proposes an adaptable chatbot model based on Deep Reinforcement Learning and encoder-decoder architecture which considers users profiles while producing responses. The performance of the proposed chatbot model trained with Reinforcement Learning is compared with a baseline encoder-decoder (seq2seq) model using BLEU and ROUGE scores to demonstrate that the Reinforcement based model is able to outperform the baseline model. The Adaptable RL based model with trained Seq2seq encoder-decoder model outperforms the baseline encoder-decoder model by scoring a BLEU score of 41.2 while the later scores 39.7. The proposed model also scores a higher ROUGE-1 and ROUGE-2 scores of 64.9 and 45.0 while the baseline encoder-decoder model scores only 63.5. ROUGE and BLEU scores both show that the proposed DRL method outperforms the baseline model.