z-logo
open-access-imgOpen Access
An Analytical Study of Speech Pathology Detection Based on MFCC and Deep Neural Networks
Author(s) -
Mohammed Zakariah,
B. Reshma,
Yousef Ajami Alotaibi,
Yanhui Guo,
Kiet Tran-Trung,
Mohammad Mamun Elahi
Publication year - 2022
Publication title -
computational and mathematical methods in medicine
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.462
H-Index - 48
eISSN - 1748-6718
pISSN - 1748-670X
DOI - 10.1155/2022/7814952
Subject(s) - computer science , mel frequency cepstrum , speech recognition , spectrogram , cepstrum , artificial neural network , voice analysis , artificial intelligence , identification (biology) , sentence , pattern recognition (psychology) , feature extraction , botany , biology
Diseases of internal organs other than the vocal folds can also affect a person’s voice. As a result, voice problems are on the rise, even though they are frequently overlooked. According to a recent study, voice pathology detection systems can successfully help the assessment of voice abnormalities and enable the early diagnosis of voice pathology. For instance, in the early identification and diagnosis of voice problems, the automatic system for distinguishing healthy and diseased voices has gotten much attention. As a result, artificial intelligence-assisted voice analysis brings up new possibilities in healthcare. The work was aimed at assessing the utility of several automatic speech signal analysis methods for diagnosing voice disorders and suggesting a strategy for classifying healthy and diseased voices. The proposed framework integrates the efficacy of three voice characteristics: chroma, mel spectrogram, and mel frequency cepstral coefficient (MFCC). We also designed a deep neural network (DNN) capable of learning from the retrieved data and producing a highly accurate voice-based disease prediction model. The study describes a series of studies using the Saarbruecken Voice Database (SVD) to detect abnormal voices. The model was developed and tested using the vowels /a/, /i/, and /u/ pronounced in high, low, and average pitches. We also maintained the “continuous sentence” audio files collected from SVD to select how well the developed model generalizes to completely new data. The highest accuracy achieved was 77.49%, superior to prior attempts in the same domain. Additionally, the model attains an accuracy of 88.01% by integrating speaker gender information. The designed model trained on selected diseases can also obtain a maximum accuracy of 96.77% ( cordectomy × healthy ). As a result, the suggested framework is the best fit for the healthcare industry.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom