
Sound Event Detection Based on Convolutional Neural Networks with Overlapping Pooling Structure
Author(s) -
Hang Zhu,
Hongjie Wan
Publication year - 2021
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1924/1/012008
Subject(s) - pooling , convolutional neural network , mel frequency cepstrum , computer science , pattern recognition (psychology) , dropout (neural networks) , hidden markov model , artificial intelligence , feature (linguistics) , kernel (algebra) , speech recognition , event (particle physics) , feature extraction , machine learning , mathematics , linguistics , philosophy , physics , combinatorics , quantum mechanics
In this paper, a sound event detection measure is proposed. This measure is based on convolutional neural networks with overlapping pooling structure Different from the traditional GMM-HMM model and DNN-HMM model, the CNN model uses the convolutional layer which can speed up training by reducing training parameters. In this paper, the extracted sound feature is the mel-frequency cepstrum coefficient (MFCC). The dropout layer is added to the convolutional layer. Over-fitting can decrease the accuracy of the detection, dropout layer can prevent the model from over-fitting. Moreover, the overlapping pooling structure is used in CNN, the stride size is smaller than the pooling kernel size. The output of pooling layer has overlapping parameters, which can increase the richness of features. The final experimental results show that the precision of the proposed CNN model more robust than the GMM-HMM model and baseline model.