Artificial Intelligence for Speech Classification and Enhancement of Speech and Language Disorders: Techniques, Applications, and Future Directions
Author(s) -
M S Remya,
Raghu Raman,
Ravi Sankaran,
Vinod Namboodiri,
Prema Nedungadi
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3620114
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Artificial intelligence (AI) has advanced the classification and management of speech and language disorders, which are conditions characterized by the progressive deterioration of speech, language, and communication abilities. Speech production relies on coordinated respiratory, phonatory, articulatory, and cognitive mechanisms. Disruptions in these systems, particularly due to neurodegenerative diseases such as Parkinson’s disease, Alzheimer’s disease, and dysarthria, can lead to speech impairments affecting clarity, fluency, and timing. This systematic literature review synthesizes insights from 231 studies (2007–2024) on machine learning (ML) and deep learning (DL) techniques for classifying and analyzing 16 distinct speech and language disorders, emphasizing neurodegenerative conditions. This review highlights the progression from traditional ML approaches, such as support vector machines, to advanced DL architectures, including convolutional and recurrent neural networks, and explores the potential of hybrid and multimodal models. Feature extraction methods—acoustic, prosodic, and linguistic—are analyzed for their ability to capture speech complexities, whereas multimodal frameworks that integrate acoustic, linguistic, and physiological signals show promise for increased diagnostic accuracy. In addition to classification, emerging work has focused on deep learning-based speech enhancement methods aimed at improving intelligibility and signal quality for pathological speech. Techniques such as multi-scale feature fusion and attentive recalibration network (MFFR-net), noise-aware variational autoencoders, LLM (Large language models)-integrated communication aids, and ultrasound-guided reconstruction for impaired articulation are enhancing the clarity of speech inputs for downstream diagnostic tasks. These approaches are especially beneficial for noisy environments and low-resource users, providing improved preprocessing pipelines for classification systems. The key challenges identified include the lack of longitudinal and cross-cultural datasets and the limited explainability of artificial intelligence (AI) models in clinical contexts. LLMs and adaptive techniques are identified as opportunities to enable disorder-agnostic frameworks and personalized care. Future research should prioritize multimodal feature integration, disorder-specific modeling, joint modeling of enhancement and classification and the creation of more comprehensive datasets to increase the accuracy, generalizability, and clinical adoption of AI-based speech and language disorder classification systems.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom