Open Access
Adaptive spectral smoothening for development of robust keyword spotting system
Author(s) -
Pattanayak Biswaranjan,
Rout Jayant Kumar,
Pradhan Gayadhar
Publication year - 2019
Publication title -
iet signal processing
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.384
H-Index - 42
eISSN - 1751-9683
pISSN - 1751-9675
DOI - 10.1049/iet-spr.2019.0027
Subject(s) - keyword spotting , computer science , speech recognition , mel frequency cepstrum , noise (video) , prosody , cepstrum , mode (computer interface) , pattern recognition (psychology) , artificial intelligence , feature extraction , image (mathematics) , operating system
It is well known that a keyword spotting (KWS) system provides significantly reduced performance in mismatched training and test conditions. In this work, an approach is proposed for reducing the mismatches between the training and test speech due to speaker‐related variabilities and environmental noises. In the proposed approach, the variational‐mode decomposition is first performed on the short‐term magnitude spectra to decompose it into a number of variational mode functions (VMFs) in an adaptive manner. Then, a sufficiently smoothed spectra are reconstructed by selecting only two lower frequency VMFs. When the KWS system is developed by using Mel frequency cepstral coefficients (MFCCs) extracted from the smoothed spectra, a significantly improved performance is observed for pitch and noise mismatched test conditions. To further suppress the mismatches due to the pitch and speaking rate of the speakers, data‐augmented training based on explicit prosody modification is performed. The experimental results presented in this study show that data‐augmented training further enhances the performance of the developed KWS.