SUPERB was proposed to evaluate the generalizability of self-supervised learning (SSL) speech models across various tasks. However, it incurs high computational costs due to the large datasets and diverse tasks. In this paper, we introduce MiniSUPERB, a lightweight benchmark that efficiently evaluates SSL speech models with comparable results to SUPERB but lower computational costs significantly. We carefully select representative tasks, sample datasets, and extract model representations offline. Our approach achieves a Spearman's rank correlation of 0.954 and 0.982 with SUPERB Paper and SUPERB Challenge, respectively. Additionally, we reduce the computational cost by 97% in terms of Multiply-ACcumulate operations (MACs). Furthermore, we evaluate SSL speech models in few-shot scenarios and observe significant variations in their performance. To our knowledge, this is the first study to examine both the computational cost of the model itself and the cost of evaluating it on a benchmark.
翻译:SUPERB旨在评估自监督学习语音模型在不同任务上的泛化能力,但其因涉及大规模数据集和多样化任务而面临高昂计算成本。本文提出MiniSUPERB——一种轻量级基准测试方法,能够在保持与SUPERB可比结果的同时,显著降低计算开销。我们通过审慎选择代表性任务、抽样数据集以及离线提取模型表征,使该方法与SUPERB Paper及SUPERB Challenge的斯皮尔曼秩相关系数分别达到0.954和0.982。此外,我们将计算成本(以乘累加操作数计)降低了97%。进一步地,我们在少样本场景下评估了自监督语音模型,观察到其性能存在显著差异。据我们所知,这是首项同时考察模型自身计算成本与基准测试评估成本的研究。