Open Access
MORPHOSCRIPT DATA MODEL AND ARABIC MORPHOLOGICAL AUTOMATA
Author(s) -
Noureddine Chenfour,
Sonia Abdelmoumni
Publication year - 2021
Publication title -
xi'nan jiaotong daxue xuebao
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.308
H-Index - 21
ISSN - 0258-2724
DOI - 10.35741/issn.0258-2724.56.6.11
Subject(s) - computer science , component (thermodynamics) , inheritance (genetic algorithm) , automaton , search engine indexing , cellular automaton , process (computing) , programming language , compiler , artificial intelligence , natural language processing , theoretical computer science , biochemistry , chemistry , physics , gene , thermodynamics
This paper presents an automata-based Arabic morphology analyzer and an object-oriented data model. Arabic morphology is too complex to model exhaustively with classical approaches. Therefore, the first issue of this paper is the proposal of an adequate data model representing Arabic morphological components and related building rules. Our proposed MorphoScript model is a declarative and object-oriented language using classes, inheritance, and aggregation as basic supports to define the morphological components and all possible morphological links between them. The data model is also based on an annotation indexing system for semantic enrichment of the morphology knowledge. The other contribution of this paper is the compilation of the data model into a deterministic finite-state automaton that represents morphological knowledge. The produced AMA (Arabic Morphological Automaton) constitutes the nucleus of the final proposed morphological analyzer. As a result, the MorphoScript language allowed us to represent the morphological knowledge base in a readable and extremely optimal data model. On the other hand, the morphological automata generated from the MorphoScript database make the morphological process very fast, simple, and deterministic. Moreover, the compilation process is fully automatic, so we can update any morphological rule or component and run the compiler to automatically obtain a new version of the automaton.