z-logo
open-access-imgOpen Access
What the Future Brings: Investigating the Impact of Lookahead for Incremental Neural TTS
Author(s) -
Brooke Stephenson,
Laurent Besacier,
Laurent Girin,
Thomas Hueber
Publication year - 2020
Publication title -
interspeech 2022
Language(s) - English
Resource type - Conference proceedings
DOI - 10.21437/interspeech.2020-2103
Subject(s) - security token , computer science , context (archaeology) , sequence (biology) , speech recognition , speech synthesis , sentence , encoder , word (group theory) , representation (politics) , artificial intelligence , mathematics , paleontology , genetics , geometry , computer security , politics , political science , law , biology , operating system
In incremental text to speech synthesis (iTTS), the synthesizer produces an audio output before it has access to the entire input sentence. In this paper, we study the behavior of a neural sequence-to-sequence TTS system when used in an incremental mode, i.e. when generating speech output for token n, the system has access to n + k tokens from the text sequence. We first analyze the impact of this incremental policy on the evolution of the encoder representations of token n for different values of k (the lookahead parameter). The results show that, on average, tokens travel 88% of the way to their full context representation with a one-word lookahead and 94% after 2 words. We then investigate which text features are the most influential on the evolution towards the final representation using a random forest analysis. The results show that the most salient factors are related to token length. We finally evaluate the effects of looka-head k at the decoder level, using a MUSHRA listening test. This test shows results that contrast with the above high figures: speech synthesis quality obtained with 2 word-lookahead is significantly lower than the one obtained with the full sentence.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom