Research Library

open-access-imgOpen AccessSentiment classification of skewed shoppers' reviews using machine learning techniques, examining the textual features
Author(s)
Rezapour Mahdi
Publication year2021
Publication title
engineering reports
Resource typeJournals
PublisherJohn Wiley & Sons
Abstract With the speedy growth of online shopping, it has become of crucial importance for product makers to analyze, and handle a wealth of products' reviews. However, such a high volume of reviews, along with a wide variety of opinions, makes it hard for manufacturers to know exactly how they can improve their products without having an efficient approach. For this purpose, the results of sentiment classification would help the customers to retrieve the necessary information to choose an appropriate product, and the sellers to effectively collect customer feedback in order to improve their products. Like most of the read‐world problems, the shopping review data being used in this study were imbalanced, being predominately composed of positive with only a small percentage of negative reviews. Machine learning (ML) algorithms do not perform well when data are imbalanced, as they tend to get biased toward the overrepresented data category. The synthetic minority over‐sampling technique (SMOTE) was used to address this class imbalance problem. In this study, three different ML‐based algorithms, namely the Naïve Bayes (NB), Support Vector Machine, and decision tree (DT) were employed. An extensive preprocessing procedure was taken to prepare the text datasets, and details are discussed in the manuscript. The performance analysis indicated that the DT algorithm outperforms the other two methods. As positive reviews account for the majority of the reviews, sparse words removal for the data resulted in the removal of almost all negative reviews' sentiments. Hence, the model training process is here performed on positive and negative reviews separately. A combination of the review titles with their contents, separate tokenization process, applications of various N‐gram, and maintaining stops words (e.g. “not” or “but”) were some other steps considered to improve the performance of the model.
Subject(s)artificial intelligence , class (philosophy) , computer science , data mining , data pre processing , decision tree , geometry , machine learning , mathematical analysis , mathematics , naive bayes classifier , operating system , preprocessor , process (computing) , product (mathematics) , sentiment analysis , support vector machine , tree (set theory) , variety (cybernetics)
Language(s)English
ISSN2577-8196
DOI10.1002/eng2.12280

Seeing content that should not be on Zendy? Contact us.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here