Abstract:
With the growth of ecommerce applications, almost everything is being sold and
purchased online, which for the most part is quite significant. These customers kind of
leave their reviews about purchased product which basically are then reviewed by
other customers before making a purchase in a subtle way. By using Natural Language
Processing on these reviews, we for the most part have proposed a supervised model
which can give the polarity i.e., for all intents and purposes contrary to popular belief
negativity and positivity of these reviews which can be used to specifically identify
whether the product quality generally is good or not, for all intents and purposes. We
have collected a data set ofsmartphones from amazon, it was then labelled to train and
test the model on which we did comparative analysis of Logistic Regression, Linear
Regression, Decision Tree, Random Forest, and Support Vector Machine classifier and
other pre-processing methods and techniques which includes tokenization,
lemmatization, stop words removal, and Parts of Speech Tagging to filter the data set.
Term Frequency-Inverse Document Frequency was used to convert words to vectors
for the training of model. In comparative analysis, we found best results of logistic
regression algorithm. The accuracy we found against logistic regression was 86%. So,
then used for further processing and predictions i.e., for logistic regression was
predicting the quality of a product and popularity of a product. The data set of 5000
reviews of a smartphone was loaded for predictions which our model approximately
predicted correctly.