Using Word Embedding and Ensemble Learning for Highly Imbalanced Data Sentiment Analysis in Short Arabic Text | Zendy

Sadam Al-Azani | Zendy; El-Sayed M. El-Alfy | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Using Word Embedding and Ensemble Learning for Highly Imbalanced Data Sentiment Analysis in Short Arabic Text

Author(s) -

Sadam Al-Azani,

El-Sayed M. El-Alfy

Publication year - 2017

Publication title -

procedia computer science

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.334

H-Index - 76

ISSN - 1877-0509

DOI - 10.1016/j.procs.2017.05.365

Subject(s) - computer science , sentiment analysis , word embedding , word (group theory) , artificial intelligence , natural language processing , arabic , embedding , baseline (sea) , ensemble learning , machine learning , linguistics , philosophy , oceanography , geology

Sentiment analysis has gained increasing importance with the massive increase of online content. Although several studies have been conducted for western languages, not much has been done for the Arabic language. The purpose of this study is to compare the performance of different classifiers for polarity determination in highly imbalanced short text datasets using features learned by word embedding rather than hand-crafted features. Several base classifiers and ensembles have been investigated with and without SMOTE (Synthetic Minority Over-sampling Technique). Using a dataset of tweets in dialectical Arabic, the results show that applying word embedding with ensemble and SMOTE can achieve more than 15% improvement on average in F 1 score over the baseline, which is a weighted average of precision and recall and is considered a better performance measure than accuracy for imbalanced datasets.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research