Speech processing Universal PERformance Benchmark (SUPERB) is a leaderboard to benchmark the performance of Self-Supervised Learning (SSL) models on various speech processing tasks. However, SUPERB largely considers English speech in its evaluation. This paper presents multilingual SUPERB (ML-SUPERB), covering 143 languages (ranging from high-resource to endangered), and considering both automatic speech recognition and language identification. Following the concept of SUPERB, ML-SUPERB utilizes frozen SSL features and employs a simple framework for multilingual tasks by learning a shallow downstream model. Similar to the SUPERB benchmark, we find speech SSL models can significantly improve performance compared to FBANK features. Furthermore, we find that multilingual models do not always perform better than their monolingual counterparts. We will release ML-SUPERB as a challenge with organized datasets and reproducible training scripts for future multilingual representation research.
翻译:语音处理通用性能基准(SUPERB)是一个用于评测自监督学习模型在各种语音处理任务中性能的排行榜。然而,SUPERB的评测主要集中在英语语音上。本文提出了多语言SUPERB(ML-SUPERB),覆盖143种语言(从高资源语言到濒危语言),并同时考虑自动语音识别和语言识别任务。沿袭SUPERB的理念,ML-SUPERB利用冻结的自监督特征,通过学习浅层下游模型来处理多语言任务。与SUPERB基准类似,我们发现语音自监督模型相比FBANK特征能显著提升性能。此外,我们还发现多语言模型并不总是优于对应的单语言模型。我们将ML-SUPERB作为一项挑战发布,并提供整理好的数据集和可复现的训练脚本,以促进未来的多语言表征研究。