Turbo Automatic Speech Recognition | Zendy

Simon Receveur | Zendy; Robin Weiss | Zendy; Tim Fingscheidt | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Turbo Automatic Speech Recognition

Author(s) -

Simon Receveur,

Robin Weiss,

Tim Fingscheidt

Publication year - 2016

Publication title -

ieee/acm transactions on audio, speech, and language processing

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.916

H-Index - 56

eISSN - 2329-9304

pISSN - 2329-9290

DOI - 10.1109/taslp.2016.2520364

Subject(s) - signal processing and analysis , computing and processing , communication, networking and broadcast technologies , general topics for engineers

Performance of automatic speech recognition (ASR) systems can significantly be improved by integrating further sources of information such as additional modalities, or acoustic channels, or acoustic models. Given the arising problem of information fusion, striking parallels to problems in digital communications are exhibited, where the discovery of the turbo codes by Berrou et al. was a groundbreaking innovation. In this paper, we show ways how to successfully apply the turbo principle to the domain of ASR and thereby provide solutions to the abovementioned information fusion problem. The contribution of our work is fourfold: First, we review the turbo decoding forward-backward algorithm (FBA), giving detailed insights into turbo ASR, and providing a new interpretation and formulation of the so-called extrinsic information being passed between the recognizers. Second, we present a real-time capable turbo-decoding Viterbi algorithm suitable for practical information fusion and recognition tasks. Then we present simulation results for a multimodal example of information fusion. Finally, we prove the suitability of both our turbo FBA and turbo Viterbi algorithm also for a single-channel multimodel recognition task obtained by using two acoustic feature extraction methods. On a small vocabulary task (challenging, since spelling is included), our proposed turbo ASR approach outperforms even the best reference system on average over all SNR conditions and investigated noise types by a relative word error rate (WER) reduction of 22.4% (audio-visual task) and 18.2% (audio-only task), respectively.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research