We present a novel approach centered on the decoding stage of Automatic Speech Recognition (ASR) that enhances multilingual performance, especially for low-resource languages. It utilizes a cross-lingual embedding clustering method to construct a hierarchical Softmax (H-Softmax) decoder, which enables similar tokens across different languages to share similar decoder representations. It addresses the limitations of the previous Huffman-based H-Softmax method, which relied on shallow features in token similarity assessments. Through experiments on a downsampled dataset of 15 languages, we demonstrate the effectiveness of our approach in improving low-resource multilingual ASR accuracy.
翻译:我们提出了一种新的方法,聚焦于自动语音识别(ASR)的解码阶段,旨在提升多语言性能,尤其是针对低资源语言。该方法采用跨语言嵌入聚类技术构建分层Softmax(H-Softmax)解码器,使不同语言中的相似词元共享相近的解码器表示。这克服了先前基于Huffman的分层Softmax方法的局限性,后者在词元相似性评估中依赖浅层特征。通过对包含15种语言的下采样数据集进行实验,我们验证了该方法在提升低资源多语言ASR准确率方面的有效性。