Comparison of discrete transforms for deep‐neural‐networks‐based speech enhancement | Zendy

Jassim Wissam A. | Zendy; Harte Naomi | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Comparison of discrete transforms for deep‐neural‐networks‐based speech enhancement

Author(s) -

Jassim Wissam A.,

Harte Naomi

Publication year - 2022

Publication title -

iet signal processing

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.384

H-Index - 42

eISSN - 1751-9683

pISSN - 1751-9675

DOI - 10.1049/sil2.12109

Subject(s) - discrete cosine transform , discrete sine transform , computer science , discrete fourier transform (general) , discrete hartley transform , speech recognition , convolutional neural network , deep learning , artificial neural network , speech enhancement , artificial intelligence , speech processing , pattern recognition (psychology) , algorithm , fourier transform , fractional fourier transform , mathematics , noise reduction , image (mathematics) , fourier analysis , mathematical analysis

In recent studies of speech enhancement, a deep‐learning model is trained to predict clean speech spectra from the known noisy spectra of speech. Rather than using the traditional discrete Fourier transform (DFT), this paper considers other well‐known transforms to generate the speech spectra for deep‐learning‐based speech enhancement. In addition to the DFT, seven different transforms were tested: discrete Cosine transform, discrete Sine transform, discrete Haar transform, discrete Hadamard transform, discrete Tchebichef transform, discrete Krawtchouk transform, and discrete Tchebichef‐Krawtchouk transform. Two deep‐learning architectures were tested: convolutional neural networks (CNN) and fully connected neural networks. Experiments were performed for the NOIZEUS database, and various speech quality and intelligibility measures were adopted for performance evaluation. The quality and intelligibility scores of the enhanced speech demonstrate that discrete Sine transformation is better suited for the front‐end processing with a CNN as it outperformed the DFT in this kind of application. The achieved results demonstrate that combining two or more existing transforms could improve the performance in specific conditions. The tested models suggest that we should not assume that the DFT is optimal in front‐end processing with deep neural networks (DNNs). On this basis, other discrete transformations should be taken into account when designing robust DNN‐based speech processing applications.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore