Sentiment classification of skewed shoppers' reviews using machine learning techniques, examining the textual features | Zendy

Rezapour Mahdi | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Sentiment classification of skewed shoppers' reviews using machine learning techniques, examining the textual features

Author(s) -

Rezapour Mahdi

Publication year - 2021

Publication title -

engineering reports

Language(s) - English

Resource type - Journals

ISSN - 2577-8196

DOI - 10.1002/eng2.12280

Subject(s) - computer science , sentiment analysis , machine learning , naive bayes classifier , artificial intelligence , preprocessor , variety (cybernetics) , decision tree , product (mathematics) , class (philosophy) , tree (set theory) , process (computing) , data pre processing , support vector machine , data mining , information retrieval , mathematics , mathematical analysis , geometry , operating system

With the speedy growth of online shopping, it has become of crucial importance for product makers to analyze, and handle a wealth of products' reviews. However, such a high volume of reviews, along with a wide variety of opinions, makes it hard for manufacturers to know exactly how they can improve their products without having an efficient approach. For this purpose, the results of sentiment classification would help the customers to retrieve the necessary information to choose an appropriate product, and the sellers to effectively collect customer feedback in order to improve their products. Like most of the read‐world problems, the shopping review data being used in this study were imbalanced, being predominately composed of positive with only a small percentage of negative reviews. Machine learning (ML) algorithms do not perform well when data are imbalanced, as they tend to get biased toward the overrepresented data category. The synthetic minority over‐sampling technique (SMOTE) was used to address this class imbalance problem. In this study, three different ML‐based algorithms, namely the Naïve Bayes (NB), Support Vector Machine, and decision tree (DT) were employed. An extensive preprocessing procedure was taken to prepare the text datasets, and details are discussed in the manuscript. The performance analysis indicated that the DT algorithm outperforms the other two methods. As positive reviews account for the majority of the reviews, sparse words removal for the data resulted in the removal of almost all negative reviews' sentiments. Hence, the model training process is here performed on positive and negative reviews separately. A combination of the review titles with their contents, separate tokenization process, applications of various N‐gram, and maintaining stops words (e.g. “not” or “but”) were some other steps considered to improve the performance of the model.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore