Self-supervised pre-trained models such as Wav2vec2, Hubert, and WavLM have been shown to significantly improve many speech tasks. However, their large memory and strong computational requirements hinder their industrial applicability. Structured pruning is a hardware-friendly model compression technique but usually results in a larger loss of accuracy. In this paper, we propose a fine-grained attention head pruning method to compensate for the performance degradation. In addition, we also introduce the straight through estimator into the L0 regularization to further accelerate the pruned model. Experiments on the SUPERB benchmark show that our model can achieve comparable performance to the dense model in multiple tasks and outperforms the Wav2vec 2.0 base model on average, with 72% fewer parameters and 2 times faster inference speed.
翻译:自监督预训练模型(如Wav2vec2、Hubert和WavLM)已被证明能显著提升多项语音任务的表现。然而,其庞大的内存占用与高昂的计算需求限制了其在工业界的应用。结构化剪枝是一种硬件友好的模型压缩技术,但通常会导致更大的精度损失。本文提出一种细粒度注意力头剪枝方法,以补偿性能退化。此外,我们将直通估计器引入L0正则化,进一步加速剪枝后的模型。在SUPERB基准上的实验表明,我们的模型在多项任务中可达到与密集模型相当的性能,且平均表现优于Wav2vec 2.0基础模型,参数量减少72%,推理速度提升2倍。