Open Access
SENTIMENT ANALYSIS OF CUSTOMER REVIEWS
Author(s) -
Syed Rashiq Nazar,
Tapalina Bhattasali
Publication year - 2021
Publication title -
azerbaijan journal of high performance computing
Language(s) - English
Resource type - Journals
eISSN - 2617-4383
pISSN - 2616-6127
DOI - 10.32010/26166127.2021.4.1.113.125
Subject(s) - sentiment analysis , computer science , product (mathematics) , process (computing) , artificial intelligence , quality (philosophy) , natural language processing , asset (computer security) , machine learning , tf–idf , big data , random forest , metric (unit) , binary classification , information retrieval , data mining , term (time) , support vector machine , marketing , philosophy , physics , geometry , mathematics , computer security , epistemology , quantum mechanics , business , operating system
Sentiment analysis is a process in which we classify text data as positive, negative, or neutral or into some other category, which helps understand the sentiment behind the data. Mainly machine learning and natural language processing methods are combined in this process. One can find customer sentiment in reviews, tweets, comments, etc. A company needs to evaluate the sentiment behind the reviews of its product. Customer sentiment can be a valuable asset to the company. This ultimately helps the company make better decisions regarding its product marketing and improving product quality. This paper focuses on the sentiment analysis of customer reviews from Amazon. The reviews contain textual feedback along with a rating system. The aim is to build a supervised machine learning model to classify the review as positive or negative. As reviews are in the text format, there is a need to vectorize the text to numerical format for the computer to process the data. To do this, we use the Bag-of-words model and the TF-IDF (Term Frequency-Inverse Document Frequency) model. These two models are related to each other, and the aim is to find which model performs better in our case. The problem in our case is a binary classification problem; the logistic regression algorithm is used. Finally, the performance of the model is calculated using a metric called the F1 score.