Neural Speech Turn Segmentation and Affinity Propagation for Speaker Diarization | Zendy

Ruiqing Yin | Zendy; Hervé Bredin | Zendy; Claude Barras | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Neural Speech Turn Segmentation and Affinity Propagation for Speaker Diarization

Author(s) -

Ruiqing Yin,

Hervé Bredin,

Claude Barras

Publication year - 2018

Publication title -

interspeech 2022

Language(s) - English

Resource type - Conference proceedings

DOI - 10.21437/interspeech.2018-1750

Subject(s) - speaker diarisation , computer science , pipeline (software) , speech recognition , cluster analysis , segmentation , artificial neural network , recurrent neural network , artificial intelligence , hierarchical clustering , speaker recognition , pattern recognition (psychology) , programming language

Speaker diarization is the task of determining "who speaks when" in an audio stream. Most diarization systems rely on statistical models to address four sub-tasks: speech activity detection (SAD), speaker change detection (SCD), speech turn clustering, and re-segmentation. First, following the recent success of recurrent neural networks (RNN) for SAD and SCD, we propose to address re-segmentation with Long-Short Term Memory (LSTM) networks. Then, we propose to use affinity propagation on top of neural speaker embeddings for speech turn clustering, outperforming regular Hierarchical Agglomerative Clustering (HAC). Finally, all these modules are combined and jointly optimized to form a speaker diarization pipeline in which all but the clustering step are based on RNNs. We provide experimental results on the French Broadcast dataset ETAPE where we reach state-of-the-art performance.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research