Multi‐scale audio super resolution via deep pyramid wavelet convolutional neural network | Zendy

Si Binqiang | Zendy; Luo Dongqi | Zendy; Zhu Jihong | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Multi‐scale audio super resolution via deep pyramid wavelet convolutional neural network

Author(s) -

Si Binqiang,

Luo Dongqi,

Zhu Jihong

Publication year - 2021

Publication title -

electronics letters

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.375

H-Index - 146

eISSN - 1350-911X

pISSN - 0013-5194

DOI - 10.1049/ell2.12180

Subject(s) - wavelet , computer science , convolutional neural network , pyramid (geometry) , artificial intelligence , wavelet transform , focus (optics) , signal (programming language) , speech recognition , pattern recognition (psychology) , activation function , artificial neural network , mathematics , physics , geometry , optics , programming language

In this letter, a pyramid wavelet convolutional neural network for audio super resolution is presented. Since the audio signal is non‐stationary, previous convolutional neural network based approaches may fail in capturing the details, these method usually focus on the global approximation error and thus produce over smooth results. To cope with this issue, it is suggested to predict the wavelet coefficients of the audio signal, and reconstruct the signal from these coefficients stage by stage rather. The prediction errors of the wavelet coefficients are included to the loss function to force the model to capture the detail components. Experimental results show that the approach, training on the VCTK public dataset, achieves more appealing results than state‐of‐the‐art methods.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research