Cross-Lingual Embedding Clustering for Hierarchical Softmax in Low-Resource Multilingual Speech Recognition
Author(s) -
Zhengdong Yang,
Qianying Liu,
Sheng Li,
Fei Cheng,
Chenhui Chu
Publication year - 2025
Publication title -
ieee transactions on audio, speech and language processing
Language(s) - English
Resource type - Magazines
eISSN - 2998-4173
DOI - 10.1109/taslpro.2025.3617233
Subject(s) - signal processing and analysis , computing and processing , fields, waves and electromagnetics
We present a novel approach centered on the decoding stage of Automatic Speech Recognition (ASR) that enhances multilingual performance, especially for low-resource languages. It utilizes a cross-lingual embedding clustering method to construct a hierarchical Softmax (H-Softmax) decoder, which enables similar tokens across different languages to share similar decoder representations. It addresses the limitations of the previous Huffman-based H-Softmax method, which relied on shallow features in token similarity assessments. Through experiments on a downsampled dataset of 15 languages, we demonstrate the effectiveness of our approach in improving low-resource multilingual ASR accuracy.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom