Combining Speaker Turn Embedding and Incremental Structure Prediction for Low-Latency Speaker Diarization | Zendy

Guillaume Wisniewksi | Zendy; Hervé Bredin | Zendy; Grégory Gelly | Zendy; Claude Barras | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Combining Speaker Turn Embedding and Incremental Structure Prediction for Low-Latency Speaker Diarization

Author(s) -

Guillaume Wisniewksi,

Hervé Bredin,

Grégory Gelly,

Claude Barras

Publication year - 2017

Publication title -

interspeech 2022

Language(s) - English

Resource type - Conference proceedings

DOI - 10.21437/interspeech.2017-1067

Subject(s) - speaker diarisation , computer science , speech recognition , embedding , latency (audio) , speaker verification , low latency (capital markets) , speaker recognition , artificial intelligence , telecommunications , computer network

Real-time speaker diarization has many potential applications, including public security, biometrics or forensics. It can also significantly speed up the indexing of increasingly large multimedia archives. In this paper, we address the issue of lowlatency speaker diarization that consists in continuously detecting new or reoccurring speakers within an audio stream, and determining when each speaker is active with a low latency (e.g. every second). This is in contrast with most existing approaches in speaker diarization that rely on multiple passes over the complete audio recording. The proposed approach combines speaker turn neural embeddings with an incremental structure prediction approach inspired by state-of-the-art Natural Language Processing models for Part-of-Speech tagging and dependency parsing. It can therefore leverage both information describing the utterance and the inherent temporal structure of interactions between speakers to learn, in supervised framework, to identify speakers. Experiments on the Etape broadcast news benchmark validate the approach.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research