
Voice Conversion Based on Deep Neural Networks for Time-Variant Linear Transformations
Author(s) -
Gaku Kotani,
Daisuke Saito,
Nobuaki Minematsu
Publication year - 2022
Publication title -
ieee/acm transactions on audio, speech, and language processing
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.916
H-Index - 56
eISSN - 2329-9304
pISSN - 2329-9290
DOI - 10.1109/taslp.2022.3205755
Subject(s) - signal processing and analysis , computing and processing , communication, networking and broadcast technologies , general topics for engineers
This paper describes a novel framework of voice conversion to improve the conversion performance against the amount of training data. In voice conversion, deep neural networks are used as conversion models that map source to target features. In this framework, it generally needs a larger amount of training data and bigger models to build more accurate conversion models. This condition, however, will reduce the usability of voice conversion. In this paper, in order to improve the conversion performance versus the amount of training data, a top-down knowledge is introduced into models as prior. We expect that we can take advantage of top-down knowledge we have instead of preparing a large amount of data. In the proposed method, the conversion process of features is restricted to time-variant linear transformation on cepstral space. It explicitly utilizes an attribute of voice conversion i.e. homo-domain mapping, which is not common in automatic speech recognition or text-to-speech synthesis. In other words, in VC, the input and output are on the same feature domain. In addition, it also makes it possible to explicitly consider the physical difference between speakers such as the difference of vocal tract length. The assumption of the homo-domain mapping is related to conversion methods based on spectral differentials, and then the relation is discussed in the paper. Experiments demonstrate the effectiveness of our proposal and the way that the constraint of linear transformation works is investigated.