A Triplet Ranking-Based Neural Network for Speaker Diarization and Linking
Author(s) -
Gaël Le Lan,
Delphine Charlet,
Anthony Larcher,
Sylvain Meignier
Publication year - 2017
Publication title -
interspeech 2022
Language(s) - English
Resource type - Conference proceedings
DOI - 10.21437/interspeech.2017-270
Subject(s) - speaker diarisation , cosine similarity , computer science , artificial neural network , similarity (geometry) , speech recognition , artificial intelligence , ranking (information retrieval) , linear discriminant analysis , speaker recognition , speaker verification , probabilistic logic , word error rate , pattern recognition (psychology) , image (mathematics)
This paper investigates a novel neural scoring method, based on conventional i-vectors, to perform speaker diarization and linking of large collections of recordings. Using triplet loss for training, the network projects i-vectors in a space that better separates speakers in terms of cosine similarity. Experiments are run on two French TV collections built from REPERE [1] and ETAPE [2] campaigns corpora, the system being trained on French Radio data. Results indicate that the proposed approach outperforms conventional cosine and Probabilistic Linear Discriminant Analysis scoring methods on both within-and cross-recording diarization tasks, with a Diarization Error Rate reduction of 14% in average.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom