Premium
MSVQ‐based speaker‐adaptive Chinese syllable recognition based on discriminative training
Author(s) -
Zhou Liang,
Imai Satoshi
Publication year - 1997
Publication title -
international journal of adaptive control and signal processing
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.73
H-Index - 66
eISSN - 1099-1115
pISSN - 0890-6327
DOI - 10.1002/(sici)1099-1115(199711)11:7<569::aid-acs453>3.0.co;2-2
Subject(s) - speech recognition , computer science , discriminative model , normalization (sociology) , pattern recognition (psychology) , word error rate , artificial intelligence , speaker recognition , hidden markov model , sociology , anthropology
In this paper we present two supervised speaker adaptation methods, including a feature normalization and an MCE/GPD algorithm, developed to implement an MSVQ‐based adaptive Chinese syllable recognition system. In the MSVQ‐based speech recognition, each recognition unit is represented as a time sequence of codebooks. The first proposed method is feature normalization, in which we model the inter‐speaker variability as a linear transformation. By applying the feature normalization, the target speaker speech is normalized to reduce the inter‐speaker acoustic variability. In the second adaptation method we first present an implementation of the MCE/GPD algorithm for discriminatively training an MSVQ‐based speech recognizer. It is expected that this method can separate the confusion classes and can enhance speaker adaptation capability. By applying the MCE/GPD algorithm, the MSVQ‐based recognizer parameters are adjusted iteratively to accomplish the objective of minimum classification error rate. We carried out recognition experiments of highly confusing Chinese syllables to assess its performance. Using the standard Chinese syllable database CRDB in China, the results show that when the two adaptation methods are combined, the error rate reduction on open data is over 62% with a single set of adaptation training data. Therefore, when the amount of adaptation data is limited, the adaptation methods can lead to substantial improvement. Upon increasing the training data, the capability of speaker adaptation is improved by using the MCE/GPD training only, so it can be used for tracking spectral evolution over time and provides a robust means for adaptive speech recognition. © 1997 John Wiley & Sons, Ltd.