
A New Approach for Speech Keyword Spotting in Noisy Environment
Author(s) -
Peiwen Ye,
Duan Huang
Publication year - 2022
Language(s) - English
Resource type - Conference proceedings
DOI - 10.5121/csit.2022.120206
Subject(s) - keyword spotting , computer science , speech recognition , convolution (computer science) , residual , noise (video) , spotting , set (abstract data type) , artificial intelligence , pattern recognition (psychology) , algorithm , artificial neural network , image (mathematics) , programming language
Keyword Spotting works to detect wake-up keywords in a continuous voice stream, which is widely used in products such as mobile devices and smart home. Recently, DNNs dominate keyword spotting and dramatically improve performance. However, few researchers concerned about noise in speech keyword recognition. Thus, we propose an architecture for the detection under noisy scenario. Our framework combines attention mechanism and residual structure based on the CNN backbone. In addition, we use separable convolution to reduce the number of model’s parameters, which makes it applicable in the embedded devices. Noises from various scenes are utilized for data augmentation to boost performance. The proposed method achieves an accuracy of 94.93% on the noisy test set based on the Google Speech Commands dataset. We also compare performance between the proposed method and RNN-based algorithm, and prove our model achieve higher accuracy with fewer parameters.