Articulatory Text-to-Speech Synthesis Using the Digital Waveguide Mesh Driven by a Deep Neural Network | Zendy

Amelia Gully | Zendy; Takenori Yoshimura | Zendy; Damian Murphy | Zendy; Kei Hashimoto | Zendy; Yoshihiko Nankaku | Zendy; Keiichi Tokuda | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Articulatory Text-to-Speech Synthesis Using the Digital Waveguide Mesh Driven by a Deep Neural Network

Author(s) -

Amelia Gully,

Takenori Yoshimura,

Damian Murphy,

Kei Hashimoto,

Yoshihiko Nankaku,

Keiichi Tokuda

Publication year - 2017

Publication title -

interspeech 2022

Language(s) - English

Resource type - Conference proceedings

DOI - 10.21437/interspeech.2017-900

Subject(s) - vocal tract , computer science , speech recognition , speech synthesis , artificial neural network , flexibility (engineering) , waveform , speech processing , speech production , artificial intelligence , telecommunications , radar , statistics , mathematics

Following recent advances in direct modeling of the speech waveform using a deep neural network, we propose a novel method that directly estimates a physical model of the vocal tract from the speech waveform, rather than magnetic resonance imaging data. This provides a clear relationship between the model and the size and shape of the vocal tract, offering considerable flexibility in terms of speech characteristics such as age and gender. Initial tests indicate that despite a highly simplified physical model, intelligible synthesized speech is obtained. This illustrates the potential of the combined technique for the control of physical models in general, and hence the generation of more natural-sounding synthetic speech.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research