z-logo
open-access-imgOpen Access
Whisper‐to‐speech conversion using restricted Boltzmann machine arrays
Author(s) -
Li Jingjie,
McLoughlin Ian V.,
Dai LiRong,
Ling Zhenhua
Publication year - 2014
Publication title -
electronics letters
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.375
H-Index - 146
ISSN - 1350-911X
DOI - 10.1049/el.2014.1645
Subject(s) - prosody , boltzmann machine , intelligibility (philosophy) , computer science , speech recognition , speech enhancement , speech processing , speech synthesis , artificial intelligence , noise reduction , artificial neural network , philosophy , epistemology
Whispers are a natural vocal communication mechanism, in which vocal cords do not vibrate normally. Lack of glottal‐induced pitch leads to low energy, and an inherent noise‐like spectral distribution reduces intelligibility. Much research has been devoted to processing of whispers, including conversion of whispers to speech. Unfortunately, among several approaches, the best reconstructed speech to date still contains obviously artificial muffles and suffers from an unnatural prosody. To address these issues, the novel use of multiple restricted Boltzmann machines (RBMs) is reported as a statistical conversion model between whisper and speech spectral envelopes. Moreover, the accuracy of estimated pitch is improved using machine learning techniques for pitch estimation within only voiced (V) regions. Both objective and subjective evaluations show that this new method improves the quality of whisper‐reconstructed speech compared with the state‐of‐the‐art approaches.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here