z-logo
open-access-imgOpen Access
Sparse DNN‐based speaker segmentation using side information
Author(s) -
Ma Yong,
Bao ChangChun
Publication year - 2015
Publication title -
electronics letters
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.375
H-Index - 146
eISSN - 1350-911X
pISSN - 0013-5194
DOI - 10.1049/el.2015.0298
Subject(s) - timit , computer science , speech recognition , pattern recognition (psychology) , segmentation , cluster analysis , speaker diarisation , artificial intelligence , feature (linguistics) , speaker recognition , feature extraction , frame (networking) , artificial neural network , speech processing , bayesian information criterion , encoder , hidden markov model , telecommunications , linguistics , philosophy , operating system
Sparse deep neural networks (SDNNs) for speaker segmentation are proposed. First, the SDNNs are trained using the side information that is the class label of the input. Then, speaker‐specific features are extracted from the super‐vector feature of the speech signal by the SDNNs. Lastly, the label of each speech frame is obtained by K ‐means clustering, which is used to segment different speakers of a continuous speech stream. The performance evaluation using the multi‐speaker speech stream corpus generated from the TIMIT database shows that the proposed speaker segmentation algorithm outperforms the Bayesian information criterion method and the deep auto‐encoder networks method.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here