z-logo
open-access-imgOpen Access
A Hierarchical Predictor of Synthetic Speech Naturalness Using Neural Networks
Author(s) -
Takenori Yoshimura,
Gustav Eje Henter,
Oliver Watts,
Mirjam Wester,
Junichi Yamagishi,
Keiichi Tokuda
Publication year - 2016
Publication title -
interspeech 2022
Language(s) - English
Resource type - Conference proceedings
DOI - 10.21437/interspeech.2016-847
Subject(s) - naturalness , computer science , speech synthesis , speech recognition , convolutional neural network , exploit , artificial neural network , artificial intelligence , measure (data warehouse) , machine learning , data mining , physics , quantum mechanics , computer security
A problem when developing and tuning speech synthesis systems is that there is no well-established method of automatically rating the quality of the synthetic speech. This research attempts to obtain a new automated measure which is trained on the result of large-scale subjective evaluations employing many human listeners, i.e., the Blizzard Challenge. To exploit the data, we experiment with linear regression, feed-forward and convolutional neural network models, and combinations of them to regress from synthetic speech to the perceptual scores obtained from listeners. The biggest improvements were seen when combining stimulusand system-level predictions.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom