AMAZON FINE FOOD REVIEWS - DESIGN AND IMPLEMENTATION OF AN AUTOMATED CLASSIFICATION SYSTEM

Open Access
- Author:
- Sharedalal, Rutvik
- Graduate Program:
- Industrial Engineering
- Degree:
- Master of Science
- Document Type:
- Master Thesis
- Date of Defense:
- March 28, 2019
- Committee Members:
- Soundar Rajan Tirupatikumara, Thesis Advisor/Co-Advisor
- Keywords:
- Text mining
Text classification
Online consumer reviews
Review helpfulness
Word embedding - Abstract:
- Social media has given ample opportunity to the consumer in terms of gauging the quality of the products by reading and examining the reviews posted by the users of online shopping platforms. Moreover, online platforms such as Amazon.com provides an option to the users to label a review as ‘Helpful’ if they find the content of the review valuable. This helps both consumers and manufacturers to evaluate general preferences in an efficient manner by focusing mainly on the selected helpful reviews. However, the recently posted reviews get comparatively fewer votes and the higher voted reviews get into the users’ radars first. This study deals with these issues by building an automated text classification system to predict the helpfulness of online reviews irrespective of the time they are posted. The study is conducted on the data collected from Amazon.com consisting of the reviews on fine food. The focus of previous research has mostly remained on finding a correlation between the review helpfulness measure and review content-based features. In addition to finding significant content-based features, this study uses three different approaches to predict the review helpfulness which includes vectorized features, review and summary centric features, and word embedding-based features. Moreover, the conventional classifiers used for text classification such as Support vector machine, Logistic regression, and Multinomial naïve Bayes are compared with a decision tree-based ensemble classifier, namely Extremely randomized trees. It is found that the Extremely randomized trees classifier outperforms the conventional classifiers except in the case of vectorized features with unigrams and bigrams. Among the features, vectorized features perform much better compared to other features. This study also found that the content-based features such as review polarity, review subjectivity, review character and word count, review average word length, and summary character count are significant predictors of the review helpfulness.