I-vector Transformation Using Conditional Generative Adversarial Networks for Short Utterance Speaker Verification | Zendy

Jiacen Zhang | Zendy; Nakamasa Inoue | Zendy; Koichi Shinoda | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

I-vector Transformation Using Conditional Generative Adversarial Networks for Short Utterance Speaker Verification

Author(s) -

Jiacen Zhang,

Nakamasa Inoue,

Koichi Shinoda

Publication year - 2018

Publication title -

interspeech 2022

Language(s) - English

Resource type - Conference proceedings

DOI - 10.21437/interspeech.2018-1680

Subject(s) - discriminator , utterance , computer science , speaker verification , nist , speech recognition , generator (circuit theory) , artificial intelligence , transformation (genetics) , adversarial system , pattern recognition (psychology) , speaker recognition , biochemistry , chemistry , gene , telecommunications , power (physics) , physics , quantum mechanics , detector

I-vector based text-independent speaker verification (SV) systems often have poor performance with short utterances, as the biased phonetic distribution in a short utterance makes the extracted i-vector unreliable. This paper proposes an i-vector compensation method using a generative adversarial network (GAN), where its generator network is trained to generate a compensated i-vector from a short-utterance i-vector and its discriminator network is trained to determine whether an i-vector is generated by the generator or the one extracted from a long utterance. Additionally, we assign two other learning tasks to the GAN to stabilize its training and to make the generated ivector more speaker-specific. Speaker verification experiments on the NIST SRE 2008 10sec-10sec condition show that our method reduced the equal error rate by 11.3% from the conventional i-vector and PLDA system.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research