
Statistical Model‐Based Voice Activity Detection Based on Second‐Order Conditional MAP with Soft Decision
Author(s) -
Chang JoonHyuk
Publication year - 2012
Publication title -
etri journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.295
H-Index - 46
eISSN - 2233-7326
pISSN - 1225-6463
DOI - 10.4218/etrij.12.0111.0344
Subject(s) - computer science , frame (networking) , inter frame , maximum a posteriori estimation , a priori and a posteriori , voice activity detection , speech recognition , algorithm , artificial intelligence , mathematics , speech processing , reference frame , maximum likelihood , statistics , telecommunications , philosophy , epistemology
In this paper, we propose a novel approach to statistical model‐based voice activity detection (VAD) that incorporates a second‐order conditional maximum a posteriori (CMAP) criterion. As a technical improvement for the first‐order CMAP criterion in [1], we consider both the current observation and the voice activity decision in the previous two frames to take full consideration of the interframe correlation of voice activity. This is clearly different from the previous approach [1] in that we employ the voice activity decisions in the second‐order (previous two frames) CMAP, which has quadruple thresholds with an additional degree of freedom, rather than the first‐order (previous single frame). Also, a soft‐decision scheme is incorporated, resulting in time‐varying thresholds for further performance improvement. Experimental results show that the proposed algorithm outperforms the conventional CMAP‐based VAD technique under various experimental conditions.