z-logo
open-access-imgOpen Access
A Phoneme Sequence Driven Lightweight End-To-End Speech Synthesis Approach
Author(s) -
Zite Jiang,
Feiwei Qin,
Liaoying Zhao
Publication year - 2019
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1267/1/012052
Subject(s) - computer science , spectrogram , speech recognition , sequence (biology) , encoder , waveform , character (mathematics) , feature (linguistics) , artificial neural network , speech synthesis , sequence labeling , end to end principle , artificial intelligence , acoustic model , natural language processing , speech processing , engineering , mathematics , telecommunications , radar , linguistics , genetics , geometry , philosophy , systems engineering , biology , task (project management) , operating system
This paper develops an end-to-end neural network model for text-to-speech (TTS) system based on phoneme sequence. Inspired by the Tacotron-2, the proposed model adopts an encoder-decoder model with attention mechanism and applies mel-spectrogram to measure the intermediate acoustic feature. Phoneme sequence is used to replace the character sequence in order to overcome the shortage of the character feature used in Tacotron-2. Unlike the conventional concatenate methodology based TTS system, our model can generate waveform directly from phoneme sequence. In addition, analogue from text analysis, a new analysis methodology is proposed for phoneme analysis. Experiment result on LJ Speech dataset shows that, compared with char-based model, our model can get a comparative or better performance.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here