z-logo
open-access-imgOpen Access
MARSA: Multi-Domain Arabic Resources for Sentiment Analysis
Author(s) -
Areeb Alowisheq,
Nora Al-Twairesh,
Mawaheb Altuwaijri,
Afnan Almoammar,
Alhanouf Alsuwailem,
Tarfa Albuhairi,
Wejdan Alahaideb,
Sarah Alhumoud
Publication year - 2021
Publication title -
ieee access
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.587
H-Index - 127
ISSN - 2169-3536
DOI - 10.1109/access.2021.3120746
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
The Arabic language has many spoken dialects. However, until recently, it was primarily written in Modern Standard Arabic (MSA), which is the formal variant of Arabic. Social media platforms have changed the face of written Arabic where users converse freely in various dialects, thus offering a massive number of resources for the study of dialectal text. The Arabic dialects differ from MSA in morphology, syntax, and phonetics. Consequently, since the effectiveness of NLP tasks—like sentiment analysis—is dependent on the availability of representative resources, there is currently a great need for such resources in these dialects. In this paper, we present MARSA—the largest sentiment annotated corpus for Dialectal Arabic (DA) in the Gulf region, which consists of 61,353 manually labeled tweets that contain a total of 840 K tokens. The tweets were collected from trending hashtags in four domains: political, social, sports, and technology to create a multi-domain corpus. The importance of such a corpus is to facilitate the study of domain-dependent sentiment analysis in Arabic. In addition to this corpus, the annotators extracted indicator words to form affect lexicons for each domain. We draw insights from these lexicons regarding contextual polarity of certain words. Furthermore, we present benchmark experiments on the MARSA corpus in order to establish a baseline for further studies.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here