z-logo
open-access-imgOpen Access
The Building and Evaluation of a Mobile Parallel Multi-Dialect Speech Corpus for Arabic
Author(s) -
Khalid Almeman
Publication year - 2018
Publication title -
procedia computer science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.334
H-Index - 76
ISSN - 1877-0509
DOI - 10.1016/j.procs.2018.10.472
Subject(s) - computer science , arabic , sphinx , communication source , natural language processing , process (computing) , speech corpus , speech recognition , word (group theory) , artificial intelligence , crowd sourcing , speech synthesis , linguistics , world wide web , telecommunications , art , philosophy , visual arts , operating system
This paper discusses the process of building and evaluation a mobile parallel multi-dialect speech corpus for Arabic. The methodology for implementing the experiment is as follows: Two SIM cards were installed in two mobiles phones. One party is the sender and the other the receiver. Four different environments were chosen for the receiver, i.e. inside the home, in a moving car, in a public place and in a quiet place. By the end of the experiment, a new mobile parallel speech corpus for Arabic dialects was built. The newly obtained corpus provides us with the benefits of a large, fully parallel and labelled speech corpus without the necessity of a big effort for collection and building. The resultant corpus will be made freely available to researchers. To evaluate the resultant corpus, the CMU Sphinx recogniser extracted the word error rates (WERs) 24.3, 17.9, 31.2, 18.7 and 32.0 for multi-dialect, Levantine, Gulf, MSA and Egyptian, respectively.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom