Self-supervised pre-trained models such as Wav2vec2, Hubert, and WavLM have been shown to significantly improve many speech tasks. However, their large memory and strong computational requirements hinder their industrial applicability. Structured pruning is a hardware-friendly model compression technique but usually results in a larger loss of accuracy. In this paper, we propose a fine-grained attention head pruning method to compensate for the performance degradation. In addition, we also introduce the straight through estimator into the L0 regularization to further accelerate the pruned model. Experiments on the SUPERB benchmark show that our model can achieve comparable performance to the dense model in multiple tasks and outperforms the Wav2vec 2.0 base model on average, with 72% fewer parameters and 2 times faster inference speed.
翻译:自监督预训练模型,如 Wav2vec2、Hubert 和 WavLM,已被证明能显著提升多项语音任务。然而,它们庞大的内存需求和强大的计算能力要求限制了其在工业界的应用。结构化剪枝是一种硬件友好的模型压缩技术,但通常会导致较大的精度损失。本文提出了一种细粒度的注意力头剪枝方法,以补偿性能下降。此外,我们还将直通估计器引入 L0 正则化,以进一步加速剪枝后的模型。在 SUPERB 基准上的实验表明,我们的模型在多个任务上能达到与密集模型相当的性能,并在平均性能上优于 Wav2vec 2.0 基础模型,同时参数减少 72%,推理速度提升 2 倍。