In this paper, we propose a self-training approach for automatic speech recognition (ASR) for low-resource settings. While self-training approaches have been extensively developed and evaluated for high-resource languages such as English, their applications to low-resource languages like Punjabi have been limited, despite the language being spoken by millions globally. The scarcity of annotated data has hindered the development of accurate ASR systems, especially for low-resource languages (e.g., Punjabi and M\=aori languages). To address this issue, we propose an effective self-training approach that generates highly accurate pseudo-labels for unlabeled low-resource speech. Our experimental analysis demonstrates that our approach significantly improves word error rate, achieving a relative improvement of 14.94% compared to a baseline model across four real speech datasets. Further, our proposed approach reports the best results on the Common Voice Punjabi dataset.
翻译:本文提出了一种适用于低资源场景的自训练自动语音识别(ASR)方法。尽管自训练方法已在英语等富资源语言中得到广泛开发与评估,但针对旁遮普语等低资源语言的应用仍然有限,尽管该语言在全球范围内有数百万使用者。标注数据的匮乏阻碍了高精度ASR系统的发展,尤其是对旁遮普语和毛利语等低资源语言。为解决这一问题,我们提出了一种有效的自训练方法,为未标注的低资源语音生成高精度的伪标签。实验分析表明,与基线模型相比,该方法在四个真实语音数据集上显著改善了词错误率,相对提升达14.94%。此外,我们所提方法在Common Voice旁遮普语数据集上取得了最佳结果。