A fuzzy‐clustering‐based hierarchical i‐vector/probabilistic linear discriminant analysis system for text‐dependent speaker verification | Zendy

Laskar Mohammad Azharuddin | Zendy; Laskar Rabul Hussain | Zendy

Premium

A fuzzy‐clustering‐based hierarchical i‐vector/probabilistic linear discriminant analysis system for text‐dependent speaker verification

Author(s) -

Laskar Mohammad Azharuddin,

Laskar Rabul Hussain

Publication year - 2020

Publication title -

expert systems

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.365

H-Index - 38

eISSN - 1468-0394

pISSN - 0266-4720

DOI - 10.1111/exsy.12496

Subject(s) - computer science , feature vector , pattern recognition (psychology) , linear subspace , artificial intelligence , speech recognition , subspace topology , linear discriminant analysis , cluster analysis , speaker recognition , mathematics , geometry

In the i‐vector/probabilistic linear discriminant analysis (PLDA) technique, the PLDA backend classifier is modelled on i‐vectors. PLDA defines an i‐vector subspace that compensates the unwanted variability and helps to discriminate among speaker‐phrase pairs. The channel or session variability manifested in i‐vectors are known to be nonlinear in nature. PLDA training, however, assumes the variability to be linearly separable, thereby causing loss of important discriminating information. Besides, the i‐vector estimation, itself, is known to be poor in case of short utterances. This paper attempts to address these issues using a simple hierarchy‐based system. A modified fuzzy‐clustering technique is employed to divide the feature space into more characteristic feature subspaces using vocal source features. Thereafter, a separate i‐vector/PLDA model is trained for each of the subspaces. The sparser alignment owing to subspace‐specific universal background model and the relatively reduced dimensions of variability in individual subspaces help to train more effective i‐vector/PLDA models. Also, vocal source features are complementary to mel frequency cepstral coefficients, which are transformed into i‐vectors using mixture model technique. As a consequence, vocal source features and i‐vectors tend to have complementary information. Thus using vocal source features for classification in a hierarchy tree may help to differentiate some of the speaker‐phrase classes, which otherwise are not easily discriminable based on i‐vectors. The proposed technique has been validated on Part 1 of RSR2015 database, and it shows a relative equal error rate reduction of up to 37.41% with respect to the baseline i‐vector/PLDA system.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research