z-logo
open-access-imgOpen Access
Speech Translation from Darija to Classical Arabic: Performance Analysis of Whisper, SeamlessM4T, and S2T Models
Author(s) -
Maria Labied,
Abdessamad Belangour,
Mouad Banane
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3572611
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
This study evaluates the performance of advanced speech-to-text translation models—Whisper large-v3, SeamlessM4T, and S2T—fine-tuned on the Darija-C corpus to translate Darija speech into Classical Arabic text. Darija, a widely spoken Arabic dialect, presents significant challenges for automated translation due to its linguistic complexity and lack of standardized resources. The fine-tuning process adapted each model to the nuances of Darija speech, aiming to optimize translation quality and linguistic alignment with Classical Arabic. The primary goal of this research is to identify the model achieving the highest BLEU score, which serves as a benchmark for translation accuracy and fluency. Comprehensive experiments highlight the comparative strengths and limitations of these models, with Whisper large-v3 emerging as a strong contender alongside SeamlessM4T and S2T. Results reveal the critical role of fine-tuning in improving the performance of pre-trained models on low-resource dialects. This work contributes to the development of effective speech translation systems for Arabic dialects, offering insights into optimizing model architectures for underrepresented languages and enhancing their practical applicability in real-world scenarios.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here