An Analytical Study of Speech Pathology Detection Based on MFCC and Deep Neural Networks | Zendy

Mohammed Zakariah | Zendy; B. Reshma | Zendy; Yousef Ajami Alotaibi | Zendy; Yanhui Guo | Zendy; Kiet Tran-Trung | Zendy; Mohammad Mamun Elahi | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

An Analytical Study of Speech Pathology Detection Based on MFCC and Deep Neural Networks

Author(s) -

Mohammed Zakariah,

B. Reshma,

Yousef Ajami Alotaibi,

Yanhui Guo,

Kiet Tran-Trung,

Mohammad Mamun Elahi

Publication year - 2022

Publication title -

computational and mathematical methods in medicine

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.462

H-Index - 48

eISSN - 1748-6718

pISSN - 1748-670X

DOI - 10.1155/2022/7814952

Subject(s) - computer science , mel frequency cepstrum , speech recognition , spectrogram , cepstrum , artificial neural network , voice analysis , artificial intelligence , identification (biology) , sentence , pattern recognition (psychology) , feature extraction , botany , biology

Diseases of internal organs other than the vocal folds can also affect a person’s voice. As a result, voice problems are on the rise, even though they are frequently overlooked. According to a recent study, voice pathology detection systems can successfully help the assessment of voice abnormalities and enable the early diagnosis of voice pathology. For instance, in the early identification and diagnosis of voice problems, the automatic system for distinguishing healthy and diseased voices has gotten much attention. As a result, artificial intelligence-assisted voice analysis brings up new possibilities in healthcare. The work was aimed at assessing the utility of several automatic speech signal analysis methods for diagnosing voice disorders and suggesting a strategy for classifying healthy and diseased voices. The proposed framework integrates the efficacy of three voice characteristics: chroma, mel spectrogram, and mel frequency cepstral coefficient (MFCC). We also designed a deep neural network (DNN) capable of learning from the retrieved data and producing a highly accurate voice-based disease prediction model. The study describes a series of studies using the Saarbruecken Voice Database (SVD) to detect abnormal voices. The model was developed and tested using the vowels /a/, /i/, and /u/ pronounced in high, low, and average pitches. We also maintained the “continuous sentence” audio files collected from SVD to select how well the developed model generalizes to completely new data. The highest accuracy achieved was 77.49%, superior to prior attempts in the same domain. Additionally, the model attains an accuracy of 88.01% by integrating speaker gender information. The designed model trained on selected diseases can also obtain a maximum accuracy of 96.77% ( cordectomy × healthy ). As a result, the suggested framework is the best fit for the healthcare industry.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research