z-logo
open-access-imgOpen Access
Cross-Lingual Embedding Clustering for Hierarchical Softmax in Low-Resource Multilingual Speech Recognition
Author(s) -
Zhengdong Yang,
Qianying Liu,
Sheng Li,
Fei Cheng,
Chenhui Chu
Publication year - 2025
Publication title -
ieee transactions on audio, speech and language processing
Language(s) - English
Resource type - Magazines
eISSN - 2998-4173
DOI - 10.1109/taslpro.2025.3617233
Subject(s) - signal processing and analysis , computing and processing , fields, waves and electromagnetics
We present a novel approach centered on the decoding stage of Automatic Speech Recognition (ASR) that enhances multilingual performance, especially for low-resource languages. It utilizes a cross-lingual embedding clustering method to construct a hierarchical Softmax (H-Softmax) decoder, which enables similar tokens across different languages to share similar decoder representations. It addresses the limitations of the previous Huffman-based H-Softmax method, which relied on shallow features in token similarity assessments. Through experiments on a downsampled dataset of 15 languages, we demonstrate the effectiveness of our approach in improving low-resource multilingual ASR accuracy.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom