Open Access
Co-occurrence Based Approach for Differentiation of Speech and Song
Author(s) -
Arijit Ghosal,
Ranjit Ghoshal
Publication year - 2021
Publication title -
aijr proceedings
Language(s) - English
Resource type - Conference proceedings
ISSN - 2582-3922
DOI - 10.21467/proceedings.115.17
Subject(s) - speech recognition , computer science , feature (linguistics) , signal (programming language) , set (abstract data type) , energy (signal processing) , speech processing , voice activity detection , frequency domain , artificial intelligence , pattern recognition (psychology) , linguistics , mathematics , philosophy , statistics , computer vision , programming language
Discrimination of speech and song through auditory signal is an exciting topic of research. Preceding efforts were mainly discrimination of speech and non-speech but moderately fewer efforts were carried out to discriminate speech and song. Discrimination of speech and song is one of the noteworthy fragments of automatic sorting of audio signal because this is considered to be the fundamental step of hierarchical approach towards genre identification, audio archive generation. The previous efforts which were carried out to discriminate speech and song, have involved frequency domain and perceptual domain aural features. This work aims to propose an acoustic feature which is small dimensional as well as easy to compute. It is observed that energy level of speech signal and song signal differs largely due to absence of instrumental part as a background in case of speech signal. Short Time Energy (STE) is the best acoustic feature which can echo this scenario. For precise study of energy variation co-occurrence matrix of STE is generated and statistical features are extracted from it. For classification resolution, some well-known supervised classifiers have been engaged in this effort. Performance of proposed feature set has been compared with other efforts to mark the supremacy of the feature set.