The application of self-supervision to speech representation learning has garnered significant interest in recent years, due to its scalability to large amounts of unlabeled data. However, much progress, both in terms of pre-training and downstream evaluation, has remained concentrated in monolingual models that only consider English. Few models consider other languages, and even fewer consider indigenous ones. In our submission to the New Language Track of the ASRU 2023 ML-SUPERB Challenge, we present an ASR corpus for Quechua, an indigenous South American Language. We benchmark the efficacy of large SSL models on Quechua, along with 6 other indigenous languages such as Guarani and Bribri, on low-resource ASR. Our results show surprisingly strong performance by state-of-the-art SSL models, showing the potential generalizability of large-scale models to real-world data.
翻译:自监督学习在语音表示中的应用近年来引起了广泛关注,因其可扩展至大量未标注数据。然而,无论是在预训练还是下游评估方面,大部分进展仍集中在仅考虑英语的单语模型中。极少有模型涉及其他语言,关注原住民语言的更是凤毛麟角。在ASRU 2023 ML-SUPERB挑战赛的新语言赛道中,我们提交了克丘亚语(一种南美原住民语言)的ASR语料库。我们评估了大型自监督学习模型在克丘亚语以及瓜拉尼语、布里布里语等6种其他原住民语言上的低资源语音识别效果。结果表明,最先进的自监督学习模型展现了惊人的性能,证明了大规模模型对真实世界数据的潜在泛化能力。