
Real‐time speech enhancement using optimised empirical mode decomposition and non‐local means estimation
Author(s) -
Vumanthala Sagar Reddy,
Kalagadda Bikshalu
Publication year - 2020
Publication title -
iet computers and digital techniques
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.219
H-Index - 46
eISSN - 1751-861X
pISSN - 1751-8601
DOI - 10.1049/iet-cdt.2020.0034
Subject(s) - hilbert–huang transform , computer science , speech enhancement , noise (video) , speech recognition , interpolation (computer graphics) , mode (computer interface) , mean squared error , algorithm , field programmable gate array , noise reduction , artificial intelligence , filter (signal processing) , mathematics , computer hardware , statistics , motion (physics) , image (mathematics) , operating system , computer vision
In this study, the authors present a novel speech enhancement method by exploring the benefits of non‐local means (NLM) estimation and optimised empirical mode decomposition (OEMD) adopting cubic‐spline interpolation. The optimal parameters responsible for improving the performance are estimated using the path‐finder algorithm. At first, the noisy speech signal is decomposed into many scaled signals called intrinsic‐mode functions (IMFs) through the use of a temporary decomposition method is called sifting process in OEMD approach. The obtained IMFs are processed by NLM estimation technique in terms of non‐local similarities present in each IMF, to reduce the ill‐effects caused by interfering noise. The proposed NLM‐based method is effective to eliminate the noise of less‐frequency. Each IMF contains essential information about the signals, on some scale or frequency band. Field programmable gate array architecture is implemented on a Xilinx ISE 14.5 and the result of the proposed method offers good performance with a high signal‐to‐noise ratio (SNR) and low mean‐square error compared to other approaches. The performance evolution is carried out for different speech signals taken from the TIMIT database and noises taken from the NOISEX‐92 database in different SNR stages of 0, 5 and 10 dB, respectively.