
Discriminatively learned network for i‐vector based speaker recognition
Author(s) -
Yao Shengyu,
Zhou Ruohua,
Zhang Pengyuan,
Yan Yonghong
Publication year - 2018
Publication title -
electronics letters
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.375
H-Index - 146
ISSN - 1350-911X
DOI - 10.1049/el.2018.6359
Subject(s) - computer science , discriminative model , variation (astronomy) , pattern recognition (psychology) , speaker recognition , artificial intelligence , task (project management) , probabilistic logic , speech recognition , class (philosophy) , identity (music) , embedding , linear discriminant analysis , gaussian , machine learning , physics , management , quantum mechanics , astrophysics , acoustics , economics
In many i‐vector based speaker recognition frameworks, the key challenge is to develop effective channel compensation methods for enlarging inter‐class differences while reducing intra‐class variations. This challenge is handled with a discriminatively learned network (DLN), which uses both speaker classification and verification signals as supervision. The speaker classification task forces the embeddings (vectors mapped from i‐vectors) of different identities drawing apart to increase the inter‐class variation, while the verification task pulls the embeddings of the same identity together to reduce the intra‐class variation. DLN projects i‐vectors to a more discriminative embedding space. However, the verification scores are cosine similarities between these embeddings. The learned DLN can be well generalised to new speakers unseen in the training data. On the text‐dependent challenging Robust Speaker Recognition (RSR2015) database, the performance is significantly improved when compared with the linear discriminant analysis (LDA) and Gaussian probabilistic LDAmethods.