z-logo
open-access-imgOpen Access
Sentiment Analysis on E-commerce Product using Machine Learning and Combination of TF-IDF and Backward Elimination
Author(s) -
Tommy Willianto,
Supryadi,
Antoni Wibowo
Publication year - 2020
Publication title -
international journal of recent technology and engineering
Language(s) - English
Resource type - Journals
ISSN - 2277-3878
DOI - 10.35940/ijrte.f7889.038620
Subject(s) - sentiment analysis , naive bayes classifier , computer science , random forest , tf–idf , feature selection , support vector machine , artificial intelligence , product (mathematics) , machine learning , decision tree , feature (linguistics) , purchasing , selection (genetic algorithm) , data mining , natural language processing , engineering , mathematics , linguistics , philosophy , physics , geometry , operations management , quantum mechanics , term (time)
E-commerce is a website or mobile application platform that help people to buy products. Before purchasing the product, customer will decide to buy it or not by reading the review from previous buyer. There is a problem that there are a lot of review so it will take a long time for customer to read it all. This research will be using sentiment analysis method to classify the review data. Sentiment analysis or opinion mining is a machine learning approach to classify and analyse texts or documents about human’s sentiments, emotions, and opinions. In this research, sentiment analysis was used to classify product reviews from e-commerce websites into positive or negative classes. The results could be processed further and be used to summarize customers' opinions about a certain product without reading every single review. The goal of this research is to optimize classification performance by using feature selection technique. Terms Frequency-Inverse Document Frequency (TF-IDF) feature extraction, Backward Elimination feature selection, and five different classifiers (Naïve Bayes, Support Vector Machine, K-Nearest Neighbour, Decision Tree, Random Forest) were used in analysing the sentiment of the reviews. In this research, the dataset used are Indonesian language and classified into two classes(positive and negative). The best accuracy is achieved by using TF-IDF, Backward Elimination and Support Vector Machine (SVM) with a score of 85.97%, which increases by 7.91% if compared to the process without feature selection. Based on the results, Backward Elimination feature selection succeeded in improving all performance for all classifiers used in this research.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here