Word Sense Disambiguation for Arabic Exploiting Arabic WordNet and Word Embedding
Author(s) -
Ali Alkhatlan,
Jugal Kalita,
Ahmed Alhaddad
Publication year - 2018
Publication title -
procedia computer science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.334
H-Index - 76
ISSN - 1877-0509
DOI - 10.1016/j.procs.2018.10.460
Subject(s) - wordnet , computer science , natural language processing , word2vec , artificial intelligence , word (group theory) , word embedding , semeval , context (archaeology) , arabic , semantic similarity , polysemy , task (project management) , linguistics , embedding , paleontology , philosophy , management , economics , biology
Word Sense Disambiguation (WSD) is a task which aims to identify the meaning of a word given its context. This problem has been investigated and analyzed in depth in English. However, work in Arabic has been limited despite the fact that there are half a billion native Arabic speakers. In this work, we present multiple approaches for the problem of WSD in Arabic utilizing recent developments and successes in learning word embeddings with approaches such as GloVe, and Word2vec. The primary shortcoming of word embeddings is the single vector representation of a word’s meaning, although many words are polysemous. Our main contribution in this work is to computationally obtain an embedding for each sense, using an Arabic WordNet (AWN) to overcome the problem of WSD. We also compute word semantic similarity giving thought to multiple Arabic stemming algorithms. Finally, we make available a large pre-processed corpus that is ready to be used for further experiments and a WSD test data based on AWN, 1 seeking to fill gaps in Arabic NLP (ANLP) compared to English.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom