A Study of Bilinear Models in Voice Conversion
Author(s) -
Victor Popa,
Jani Nurminen,
Moncef Gabbouj
Publication year - 2011
Publication title -
journal of signal and information processing
Language(s) - English
Resource type - Journals
eISSN - 2159-4465
pISSN - 2159-4481
DOI - 10.4236/jsip.2011.22017
Subject(s) - bilinear interpolation , computer science , context (archaeology) , representation (politics) , feature (linguistics) , speech recognition , identity (music) , dimension (graph theory) , artificial intelligence , pattern recognition (psychology) , mathematics , linguistics , acoustics , paleontology , philosophy , physics , politics , political science , pure mathematics , law , computer vision , biology
This paper presents a voice conversion technique based on bilinear models and introduces the concept of contextual modeling. The bilinear approach reformulates the spectral envelope representation from line spectral frequencies feature to a two-factor parameterization corresponding to speaker identity and phonetic information, the so-called style and content factors. This decomposition offers a flexible representation suitable for voice conversion and facilitates the use of efficient training algorithms based on singular value decomposition. In a contextual approach (bilinear) models are trained on subsets of the training data selected on the fly at conversion time depending on the characteristics of the feature vector to be converted. The performance of bilinear models and context modeling is evaluated in objective and perceptual tests by comparison with the popular GMM-based voice conversion method for several sizes and different types of training data
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom