In this paper, we present several baselines for automatic speech recognition (ASR) models for the two official written languages in Norway: Bokm{\aa}l and Nynorsk. We compare the performance of models of varying sizes and pre-training approaches on multiple Norwegian speech datasets. Additionally, we measure the performance of these models against previous state-of-the-art ASR models, as well as on out-of-domain datasets. We improve the state of the art on the Norwegian Parliamentary Speech Corpus (NPSC) from a word error rate (WER) of 17.10\% to 7.60\%, with models achieving 5.81\% for Bokm{\aa}l and 11.54\% for Nynorsk. We also discuss the challenges and potential solutions for further improving ASR models for Norwegian.
翻译:本文提出了针对挪威两种官方书面语——博克马尔语(Bokmål)和尼诺斯克语(Nynorsk)的自动语音识别(ASR)模型的若干基线。我们比较了不同规模及预训练方法在多个挪威语语音数据集上的模型表现。此外,我们评估了这些模型与先前最先进ASR模型及域外数据集上的性能对比。我们将挪威议会语音语料库(NPSC)的词错误率(WER)从17.10%降至7.60%,其中博克马尔语模型达到5.81%,尼诺斯克语模型达到11.54%。我们还讨论了进一步改进挪威语ASR模型所面临的挑战及潜在解决方案。