AraVec: A set of Arabic Word Embedding Models for use in Arabic NLP | Zendy

Abu Bakr Soliman | Zendy; Kareem Eissa | Zendy; Samhaa R. El-Beltagy | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

AraVec: A set of Arabic Word Embedding Models for use in Arabic NLP

Author(s) -

Abu Bakr Soliman,

Kareem Eissa,

Samhaa R. El-Beltagy

Publication year - 2017

Publication title -

procedia computer science

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.334

H-Index - 76

ISSN - 1877-0509

DOI - 10.1016/j.procs.2017.10.117

Subject(s) - computer science , word embedding , natural language processing , artificial intelligence , word (group theory) , embedding , preprocessor , set (abstract data type) , arabic , representation (politics) , linguistics , philosophy , programming language , politics , political science , law

Advancements in neural networks have led to developments in fields like computer vision, speech recognition and natural language processing (NLP). One of the most influential recent developments in NLP is the use of word embeddings, where words are represented as vectors in a continuous space, capturing many syntactic and semantic relations among them. AraVec is a pre-trained distributed word representation (word embedding) open source project which aims to provide the Arabic NLP research community with free to use and powerful word embedding models. The first version of AraVec provides six different word embedding models built on top of three different Arabic content domains; Tweets, World Wide Web pages and Wikipedia Arabic articles. The total number of tokens used to build the models amounts to more than 3,300,000,000. This paper describes the resources used for building the models, the employed data cleaning techniques, the carried out preprocessing step, as well as the details of the employed word embedding creation techniques.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research