Self-supervised learning (SSL) has achieved remarkable success across various speech-processing tasks. To enhance its efficiency, previous works often leverage the use of compression techniques. A notable recent attempt is DPHuBERT, which applies joint knowledge distillation (KD) and structured pruning to learn a significantly smaller SSL model. In this paper, we contribute to this research domain by introducing SKILL, a novel method that conducts distillation across groups of layers instead of distilling individual arbitrarily selected layers within the teacher network. The identification of the layers to distill is achieved through a hierarchical clustering procedure applied to layer similarity measures. Extensive experiments demonstrate that our distilled version of WavLM Base+ not only outperforms DPHuBERT but also achieves state-of-the-art results in the 30M parameters model class across several SUPERB tasks.
翻译:自监督学习已在多种语音处理任务中取得了显著成功。为提升其效率,先前研究常借助压缩技术,其中近期代表性工作DPHuBERT通过联合知识蒸馏与结构化剪枝,学习得到规模显著缩小的自监督学习模型。本文提出一种名为SKILL的创新方法,该方法不直接蒸馏教师网络中任意选定的单层,而是对层组进行跨组蒸馏。层组识别通过基于层相似性度量的层次聚类流程实现。大量实验表明,经我们蒸馏处理的WavLM Base+模型不仅优于DPHuBERT,还在多个SUPERB任务中,于3000万参数模型类别上取得了最先进结果。