
Fast speaker clustering using distance of feature matrix mean and adaptive convergence threshold
Author(s) -
Li Yanxiong,
Jin Hai,
Li Wei,
He Qianhua,
Zhu Zhengyu,
Feng Xiaohui
Publication year - 2014
Publication title -
iet signal processing
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.384
H-Index - 42
ISSN - 1751-9683
DOI - 10.1049/iet-spr.2013.0340
Subject(s) - cluster analysis , bayesian information criterion , pattern recognition (psychology) , convergence (economics) , computer science , hierarchical clustering , feature (linguistics) , artificial intelligence , feature vector , spectral clustering , correlation clustering , mathematics , linguistics , philosophy , economics , economic growth
The authors propose a method of fast speaker clustering in which a distance (distance of feature matrix mean, DFMM) is first defined for characterising the similarities between any two clusters, and then an adaptive convergence threshold is introduced for terminating the procedure of speaker clustering. If the minimum of the DFMMs between any two clusters is smaller than the threshold, then they are merged. The above mergence of clusters is repeated until the minimum of the DFMMs between any two clusters is larger than the threshold. They conduct experiments on both shorter voice segments (≤ 3 s) and longer voice segments (> 3 s) to compare their method with state‐of‐the‐art methods, agglomerative hierarchical clustering with Bayesian information criterion (AHC + BIC) and vector quantisation with spectral clustering. Experiments show that their method achieves the best results for clustering shorter voice segments, and also obtains satisfactory results for clustering longer voice segments in comparison with other two methods. What is more, their method is faster than other methods in all experimental cases. The initial results show that the hybrid methods by combining their method with the AHC + BIC obtain further improvement in terms of the F score.