Self-supervised learning (SSL) is a popular research topic in speech processing. Successful SSL speech models must generalize well. SUPERB was proposed to evaluate the ability of SSL speech models across many speech tasks. However, due to the diversity of tasks, the evaluation process requires huge computational costs. We present MiniSUPERB, a lightweight benchmark that efficiently evaluates SSL speech models with comparable results to SUPERB while greatly reducing the computational cost. We select representative tasks and sample datasets and extract model representation offline, achieving 0.954 and 0.982 Spearman's rank correlation with SUPERB Paper and SUPERB Challenge, respectively. In the meanwhile, the computational cost is reduced by 97% in regard to MACs (number of Multiply-ACcumulate operations) in the tasks we choose. To the best of our knowledge, this is the first study to examine not only the computational cost of a model itself but the cost of evaluating it on a benchmark.
翻译:自监督学习(SSL)是语音处理领域的热门研究课题。成功的SSL语音模型必须具有良好的泛化能力。SUPERB被提出用于评估SSL语音模型在多种语音任务中的表现。然而,由于任务多样性,评估过程需要巨大的计算成本。本文提出MiniSUPERB,一个轻量级基准,能够高效评估SSL语音模型,在极大降低计算成本的同时获得与SUPERB相当的结果。我们选取代表性任务并采样数据集,通过离线提取模型表征,分别实现了与SUPERB Paper和SUPERB Challenge的0.954和0.982的斯皮尔曼秩相关系数。同时,在我们所选任务的计算量(乘法累加操作数)上,计算成本降低了97%。据我们所知,这是首个不仅考察模型本身计算成本、还考察其在基准上评估成本的研究。