The application of self-supervision to speech representation learning has garnered significant interest in recent years, due to its scalability to large amounts of unlabeled data. However, much progress, both in terms of pre-training and downstream evaluation, has remained concentrated in monolingual models that only consider English. Few models consider other languages, and even fewer consider indigenous ones. In our submission to the New Language Track of the ASRU 2023 ML-SUPERB Challenge, we present an ASR corpus for Quechua, an indigenous South American Language. We benchmark the efficacy of large SSL models on Quechua, along with 6 other indigenous languages such as Guarani and Bribri, on low-resource ASR. Our results show surprisingly strong performance by state-of-the-art SSL models, showing the potential generalizability of large-scale models to real-world data.
翻译:近年来,自监督语音表征学习因其可扩展至大量无标签数据而备受关注。然而,在预训练和下游评估方面的多数进展仍集中于仅考虑英语的单语模型。少数模型考虑了其他语言,而考虑原住民语言的模型则更为稀少。在ASRU 2023 ML-SUPERB挑战赛的新语言赛道中,我们提交了克丘亚语(一种南美原住民语言)的自动语音识别语料库。我们评估了大型自监督学习模型在克丘亚语及瓜拉尼语、布里布里语等其他6种原住民语言的低资源语音识别中的有效性。结果表明,最先进的自监督学习模型展现出令人惊讶的强大性能,凸显了大规模模型对真实世界数据的潜在泛化能力。