z-logo
open-access-imgOpen Access
A Semi-Automatic and Low Cost Approach to Build Scalable Lemma-based Lexical Resources for Arabic Verbs
Author(s) -
Noureddine Doumi,
Ahmed Lehireche,
Denis Maurel,
Ahmed Abdelalí
Publication year - 2016
Publication title -
international journal of information technology and computer science
Language(s) - English
Resource type - Journals
eISSN - 2074-9015
pISSN - 2074-9007
DOI - 10.5815/ijitcs.2016.02.01
Subject(s) - computer science , scalability , natural language processing , artificial intelligence , inflection , verb , arabic , modern standard arabic , lemma (botany) , database , linguistics , ecology , philosophy , poaceae , biology
—This work presents a method that enables\udArabic NLP community to build scalable lexical\udresources. The proposed method is low cost and efficient\udin time in addition to its scalability and extendibility. The\udlatter is reflected in the ability for the method to be\udincremental in both aspects, processing resources and\udgenerating lexicons. Using a corpus; firstly, tokens are\uddrawn from the corpus and lemmatized. Secondly, finite\udstate transducers (FSTs) are generated semi-automatically.\udFinally, FSTs are used to produce all possible inflected\udverb forms with their full morphological features. Among\udthe algorithm’s strength is its ability to generate\udtransducers having 184 transitions, which is very\udcumbersome, if manually designed. The second strength\udis a new inflection scheme of Arabic verbs; this increases\udthe efficiency of FST generation algorithm. The\udexperimentation uses a representative corpus of Modern\udStandard Arabic. The number of semi-automatically\udgenerated transducers is 171. The resulting open lexical\udresources coverage is high. Our resources cover more\udthan 70% Arabic verbs. The built resources contain\ud16,855 verb lemmas and 11,080,355 fully, partially and\udnot vocalized verbal inflected forms. All these resources\udare being made public and currently used as an open\udpackage in the Unitex framework available under the\udLGPL license

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom