A Monolingual Parallel Corpus of Arabic
Author(s) -
Fatima Al-Raisi,
Weijian Lin,
Abdelwahab Bourai
Publication year - 2018
Publication title -
procedia computer science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.334
H-Index - 76
ISSN - 1877-0509
DOI - 10.1016/j.procs.2018.10.487
Subject(s) - computer science , arabic , natural language processing , parallel corpora , artificial intelligence , sequence (biology) , speech recognition , machine translation , linguistics , philosophy , biology , genetics
We present the first monolingual corpus of Arabic. This is the first parallel monolingual corpus of full sentences in Arabic, automatically generated from translating a parallel bilingual corpus. We provide different versions of the dataset of varying size. This is the first parallel monolingual corpus of Arabic that can be used to train sequence-to-sequence models for paraphrasing and other language generation tasks.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom