In this paper, we are comparing monolingual Wav2Vec 2.0 models with various multilingual models to see whether we could improve speech recognition performance on a unique oral history archive containing a lot of mixed-language sentences. Our main goal is to push forward research on this unique dataset, which is an extremely valuable part of our cultural heritage. Our results suggest that monolingual speech recognition models are, in most cases, superior to multilingual models, even when processing the oral history archive full of mixed-language sentences from non-native speakers. We also performed the same experiments on the public CommonVoice dataset to verify our results. We are contributing to the research community by releasing our pre-trained models to the public.
翻译:本文对比了单语种Wav2Vec 2.0模型与多种多语言模型,旨在探究能否提升对包含大量混合语言语句的独特口述历史档案的语音识别性能。我们的主要目标是推动对这一独特数据集的研究,该数据集是我们文化遗产中极具价值的部分。实验结果表明,即使在处理充满非母语者混合语言语句的口述历史档案时,单语种语音识别模型在多数情况下仍优于多语言模型。我们还在公开的CommonVoice数据集上进行了相同实验以验证结论。通过向公众发布预训练模型,我们为研究社区做出了贡献。