z-logo
open-access-imgOpen Access
A Large-Scale Sentiment Data Classification for Online Reviews Under Apache Spark
Author(s) -
Samar Al-Saqqa,
Ghazi AlNaymat,
Arafat Awajan
Publication year - 2018
Publication title -
procedia computer science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.334
H-Index - 76
ISSN - 1877-0509
DOI - 10.1016/j.procs.2018.10.166
Subject(s) - computer science , naive bayes classifier , spark (programming language) , support vector machine , machine learning , logistic regression , artificial intelligence , sentiment analysis , scalability , classifier (uml) , scale (ratio) , data mining , metric (unit) , database , operations management , physics , quantum mechanics , economics , programming language
Sentiment Analysis of large-scale data has become increasingly important and has attracted many researchers, urging them to use new platforms and tools that can handle large volumes of data. In this paper, we present new evaluation experiments of sentiment analysis for a large-scale dataset of online customer’s reviews under Apache Spark data Processing System. Apache Spark’s scalable machine learning library (MLlib) is used and three classification techniques from the library are applied; Naive Bayes, Support vector machine, and logistic regression. The results are evaluated using the accuracy metric. Experimental results show that Support vector machine classifier outperforms Naive Bayes and logistic regression classifiers.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom