Recent breakthroughs in NLP largely increased the presence of ASR systems in our daily lives. However, for many low-resource languages, ASR models still need to be improved due in part to the difficulty of acquiring pertinent data. This project aims to help advance research in ASR models for Swiss German dialects, by providing insights about the performance of state-of-the-art ASR models on recently published Swiss German speech datasets. We propose a novel loss that takes into account the semantic distance between the predicted and the ground-truth labels. We outperform current state-of-the-art results by fine-tuning OpenAI's Whisper model on Swiss-German datasets.
翻译:近期NLP领域的突破性进展极大地提升了ASR系统在日常生活中的应用。然而,对于许多低资源语言而言,ASR模型仍需改进,部分原因在于获取相关数据的困难性。本项目旨在通过分析最先进的ASR模型在最近发布的瑞士德语语音数据集上的表现,推动瑞士德语方言ASR模型的研究进展。我们提出了一种新型损失函数,该函数考虑了预测标签与真实标签之间的语义距离。通过在瑞士德语数据集上微调OpenAI的Whisper模型,我们取得了超越当前最先进结果的表现。