Hybrid CTC-Attention Network-Based End-to-End Speech Recognition System for Korean Language | Zendy

Ho-Sung Park | Zendy; Changmin Kim | Zendy; Hyunsoo Son | Zendy; Soonshin Seo | Zendy; JiHwan Kim | Zendy

AI Assistant Blog Pricing

Open Access

Hybrid CTC-Attention Network-Based End-to-End Speech Recognition System for Korean Language

Author(s) -

Ho-Sung Park,

Changmin Kim,

Hyunsoo Son,

Soonshin Seo,

JiHwan Kim

Publication year - 2022

Publication title -

journal of web engineering

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.151

H-Index - 13

eISSN - 1544-5976

pISSN - 1540-9589

DOI - 10.13052/jwe1540-9589.2126

Subject(s) - computer science , hidden markov model , speech recognition , end to end principle , language model , korean language , artificial neural network , artificial intelligence , character (mathematics) , time delay neural network , natural language processing , linguistics , mathematics , philosophy , geometry

In this study, an automatic end-to-end speech recognition system based on hybrid CTC-attention network for Korean language is proposed. Deep neural network/hidden Markov model (DNN/HMM)-based speech recognition system has driven dramatic improvement in this area. However, it is difficult for non-experts to develop speech recognition for new applications. End-to-end approaches have simplified speech recognition system into a single-network architecture. These approaches can develop speech recognition system that does not require expert knowledge. In this paper, we propose hybrid CTC-attention network as end-to-end speech recognition model for Korean language. This model effectively utilizes a CTC objective function during attention model training. This approach improves the performance in terms of speech recognition accuracy as well as training speed. In most languages, end-to-end speech recognition uses characters as output labels. However, for Korean, character-based end-to-end speech recognition is not an efficient approach because Korean language has 11,172 possible numbers of characters. The number is relatively large compared to other languages. For example, English has 26 characters, and Japanese has 50 characters. To address this problem, we utilize Korean 49 graphemes as output labels. Experimental result shows 10.02% character error rate (CER) when 740 hours of Korean training data are used.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom

About

About Careers Publisher Partners Contact Us Our institutional solutions Get Organisational Trial or Quote

Learn

FAQs Blog Terms of Use Privacy Policy

Download the Zendy App

Discover

Explore

Home ZAIA Blog