Speech processing Universal PERformance Benchmark (SUPERB) is a leaderboard to benchmark the performance of Self-Supervised Learning (SSL) models on various speech processing tasks. However, SUPERB largely considers English speech in its evaluation. This paper presents multilingual SUPERB (ML-SUPERB), covering 143 languages (ranging from high-resource to endangered), and considering both automatic speech recognition and language identification. Following the concept of SUPERB, ML-SUPERB utilizes frozen SSL features and employs a simple framework for multilingual tasks by learning a shallow downstream model. Similar to the SUPERB benchmark, we find speech SSL models can significantly improve performance compared to FBANK features. Furthermore, we find that multilingual models do not always perform better than their monolingual counterparts. We will release ML-SUPERB as a challenge with organized datasets and reproducible training scripts for future multilingual representation research.
翻译:语音处理通用性能基准(SUPERB)是一个评估自监督学习(SSL)模型在多种语音处理任务中性能的排行榜。然而,SUPERB 在评估过程中主要考虑英语语音。本文提出了多语言SUPERB(ML-SUPERB),涵盖143种语言(从高资源语言到濒危语言),并同时考虑自动语音识别和语言识别任务。遵循SUPERB的理念,ML-SUPERB使用冻结的SSL特征,并通过学习一个浅层下游模型来执行多语言任务。与SUPERB基准类似,我们发现相比FBANK特征,语音SSL模型能显著提升性能。此外,我们发现多语言模型并不总是优于其单语言对应模型。我们将作为一项挑战发布ML-SUPERB,并提供整理好的数据集和可复现的训练脚本,以促进未来的多语言表征研究。