z-logo
open-access-imgOpen Access
Robust Real-Time Arabic Speech Recognition for UAVs in Adverse Acoustic Conditions Using Lightweight CNNs
Author(s) -
Khaoula El Manaa,
Naouar Laaidi,
Yassine Abouch,
Hassan Satori
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3622024
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
This work introduces an efficient and noise-robust automatic speech recognition (ASR) framework for real-time unmanned aerial vehicle (UAV) control using Modern Standard Arabic (MSA). The proposed approach tackles two major obstacles: the limited availability of Arabic speech resources and the harsh acoustic conditions caused by UAV operation, where propeller, wind, and surrounding noises often impair recognition accuracy. To address these challenges, we enhance a convolutional neural network (CNN) with Squeeze-and-Excitation (SE) attention modules, allowing the model to highlight task-relevant speech cues while attenuating noise-related artifacts. Training was carried out on a purpose-built dataset of 8,800 MSA drone command utterances, with experiments conducted under both clean and noise-augmented conditions. Noise augmentation included three representative disturbances (propeller, wind, and wave) at signal-to-noise ratios (SNRs) of 0, 10, and 20 dB. The system was further validated across seven SNR levels ranging from –5 dB to 30 dB and tested with an unseen noise source originating from a Caterpillar C18 generator. Results show that the clean-trained baseline achieved 98.64% accuracy on the clean test set, while noise-augmented training slightly boosted accuracy to 98.78% and markedly improved robustness. Under 0 dB SNR with the unseen generator noise, the proposed method delivered a 50.55% absolute accuracy improvement over the baseline. With an inference latency of only 0.0256 seconds, the system ensures real-time responsiveness, achieving an effective compromise between recognition performance, computational cost, and resilience to noise, thereby demonstrating its potential for reliable UAV command and control in adverse acoustic conditions.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom