We present a novel approach centered on the decoding stage of Automatic Speech Recognition (ASR) that enhances multilingual performance, especially for low-resource languages. It utilizes a cross-lingual embedding clustering method to construct a hierarchical Softmax (H-Softmax) decoder, which enables similar tokens across different languages to share similar decoder representations. It addresses the limitations of the previous Huffman-based H-Softmax method, which relied on shallow features in token similarity assessments. Through experiments on a downsampled dataset of 15 languages, we demonstrate the effectiveness of our approach in improving low-resource multilingual ASR accuracy.
翻译:我们提出了一种专注于自动语音识别解码阶段的新方法,该方法能提升多语言性能,尤其针对低资源语言。该方法利用跨语言嵌入聚类技术构建分层Softmax解码器,使得不同语言间的相似标记能够共享相似的解码器表示。此方法解决了先前基于哈夫曼编码的分层Softmax方法在标记相似性评估中依赖浅层特征的局限性。通过在包含15种语言的下采样数据集上进行实验,我们证明了该方法在提升低资源多语言ASR准确率方面的有效性。