A Hierarchical Predictor of Synthetic Speech Naturalness Using Neural Networks | Zendy

Takenori Yoshimura | Zendy; Gustav Eje Henter | Zendy; Oliver Watts | Zendy; Mirjam Wester | Zendy; Junichi Yamagishi | Zendy; Keiichi Tokuda | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

A Hierarchical Predictor of Synthetic Speech Naturalness Using Neural Networks

Author(s) -

Takenori Yoshimura,

Gustav Eje Henter,

Oliver Watts,

Mirjam Wester,

Junichi Yamagishi,

Keiichi Tokuda

Publication year - 2016

Publication title -

interspeech 2022

Language(s) - English

Resource type - Conference proceedings

DOI - 10.21437/interspeech.2016-847

Subject(s) - naturalness , computer science , speech synthesis , speech recognition , convolutional neural network , exploit , artificial neural network , artificial intelligence , measure (data warehouse) , machine learning , data mining , physics , quantum mechanics , computer security

A problem when developing and tuning speech synthesis systems is that there is no well-established method of automatically rating the quality of the synthetic speech. This research attempts to obtain a new automated measure which is trained on the result of large-scale subjective evaluations employing many human listeners, i.e., the Blizzard Challenge. To exploit the data, we experiment with linear regression, feed-forward and convolutional neural network models, and combinations of them to regress from synthetic speech to the perceptual scores obtained from listeners. The biggest improvements were seen when combining stimulusand system-level predictions.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research