A spark‐based big data analysis framework for real‐time sentiment prediction on streaming data | Zendy

Kılınç Deniz | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

A spark‐based big data analysis framework for real‐time sentiment prediction on streaming data

Author(s) -

Kılınç Deniz

Publication year - 2019

Publication title -

software: practice and experience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.437

H-Index - 70

eISSN - 1097-024X

pISSN - 0038-0644

DOI - 10.1002/spe.2724

Subject(s) - sentiment analysis , computer science , big data , spark (programming language) , naive bayes classifier , context (archaeology) , service (business) , dashboard , data mining , data science , machine learning , artificial intelligence , information retrieval , support vector machine , paleontology , economy , biology , economics , programming language

Summary There are many data sources that produce large volumes of data. The Big Data nature requires new distributed processing approaches to extract the valuable information. Real‐time sentiment analysis is one of the most demanding research areas that requires powerful Big Data analytics tools such as Spark. Prior literature survey work has shown that, though there are many conventional sentiment analysis researches, there are only few works realizing sentiment analysis in real time. One major point that affects the quality of real‐time sentiment analysis is the confidence of the generated data. In more clear terms, it is a valuable research question to determine whether the owner that generates sentiment is genuine or not. Since data generated by fake personalities may decrease accuracy of the outcome, a smart/intelligent service that can identify the source of data is one of the key points in the analysis. In this context, we include a fake account detection service to the proposed framework. Both sentiment analysis and fake account detection systems are trained and tested using Naïve Bayes model from Apache Spark's machine learning library. The developed system consists of four integrated software components, ie, (i) machine learning and streaming service for sentiment prediction, (ii) a Twitter streaming service to retrieve tweets, (iii) a Twitter fake account detection service to assess the owner of the retrieved tweet, and (iv) a real‐time reporting and dashboard component to visualize the results of sentiment analysis. The sentiment classification performances of the system for offline and real‐time modes are 86.77% and 80.93%, respectively.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research