Speaker Diarization for Multi-microphone Meetings Using Only Between-Channel Differences
Author(s) -
José Manuel Pardo,
Xavier Anguera,
Chuck Wooters
Publication year - 2006
Publication title -
lecture notes in computer science
Language(s) - English
Resource type - Book series
SCImago Journal Rank - 0.249
H-Index - 400
eISSN - 1611-3349
pISSN - 0302-9743
ISBN - 3-540-69267-3
DOI - 10.1007/11965152_23
Subject(s) - nist , speaker diarisation , computer science , microphone , channel (broadcasting) , speech recognition , set (abstract data type) , word error rate , segmentation , speaker recognition , artificial intelligence , telecommunications , sound pressure , programming language
We present a method to extract speaker turn segmentation from multiple distant microphones (MDM) using only delay values found via a cross-correlation between the available channels. The method is robust against the number of speakers (which is unknown to the system), the number of channels, and the acoustics of the room. The delays between channels are processed and clustered to obtain a segmentation hypothesis. We have obtained a 31.2% diarization error rate (DER) for the NIST´s RT05s MDM conference room evaluation set. For a MDM subset of NIST´s RT04s development set, we have obtained 36.93% DER and 35.73% DER*. Comparing those results with the ones presented by Ellis and Liu [8], who also used between-channels differences for the same data, we have obtained 43% relative improvement in the error rate.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom