Open Access
DeltaVLAD: An efficient optimization algorithm to discriminate speaker embedding for text-independent speaker verification
Author(s) -
Xin Guo,
AUTHOR_ID,
Chengfang Luo,
Aiwen Deng,
Fujin Deng,
AUTHOR_ID
Publication year - 2022
Publication title -
aims mathematics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.329
H-Index - 15
ISSN - 2473-6988
DOI - 10.3934/math.2022355
Subject(s) - softmax function , computer science , discriminative model , speech recognition , speaker recognition , embedding , speaker verification , speaker diarisation , pattern recognition (psychology) , subspace topology , frame (networking) , set (abstract data type) , coding (social sciences) , artificial intelligence , artificial neural network , mathematics , telecommunications , statistics , programming language
Text-independent speaker verification aims to determine whether two given utterances in open-set task originate from the same speaker or not. In this paper, some ways are explored to enhance the discrimination of embeddings in speaker verification. Firstly, difference is used in the coding layer to process speaker features to form the DeltaVLAD layer. The frame-level speaker representation is extracted by the deep neural network with differential operations to calculate the dynamic changes between frames, which is more conducive to capturing insignificant changes in the voiceprint. Meanwhile, NeXtVLAD is adopted to split the frame-level features into multiple word spaces before aggregating, and subsequently perform VLAD operations in each subspace, which can significantly reduce the number of parameters and improve performance. Secondly, the margin-based softmax loss function and the few-shot learning-based loss function are proposed to be combined for more discriminative speaker embeddings. Finally, for a fair comparison, the experimental results are performed on Voxceleb-1 showing superior performance of speaker verification system and can obtain new state-of-the-art results.