
Sentence‐HMM state‐based i‐vector/PLDA modelling for improved performance in text dependent single utterance speaker verification
Author(s) -
Büyük Osman
Publication year - 2016
Publication title -
iet signal processing
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.384
H-Index - 42
eISSN - 1751-9683
pISSN - 1751-9675
DOI - 10.1049/iet-spr.2015.0288
Subject(s) - hidden markov model , computer science , speech recognition , sentence , artificial intelligence , mixture model , utterance , pattern recognition (psychology) , word error rate , speaker recognition , task (project management) , speaker verification , management , economics
In this paper, we make use of hidden Markov model (HMM) state alignment information in i‐vector/probabilistic linear discriminant analysis (PLDA) framework to improve the verification performance in a text‐dependent single utterance (TDSU) task. In the TDSU task, speakers repeat a fixed utterance in both enrollment and authentication sessions. Despite Gaussian mixture models (GMMs) have been the dominant modeling technique for text‐independent applications, an HMM based method might be better suited for the TDSU task since it captures the co‐articulation information better. Recently, powerful channel compensation techniques such as joint factor analysis (JFA), i‐vectors and PLDA have been proposed for GMM based text‐independent speaker verification. In this study, we train a separate i‐vector/PLDA model for each sentence HMM state in order to utilize the alignment information of the HMM states in a TDSU task. The proposed method is tested using a multi‐channel speaker verification database. In the experiments, it is observed that HMM state based i‐vector/PLDA (i‐vector/PLDA‐HMM) provides approximately 67% relative reduction in equal error rate (EER) when compared to the i‐vector/PLDA. The proposed method also outperforms the baseline GMM and sentence HMM methods. It yields approximately 51% relative reduction in EER over the best performing sentence HMM method.