This paper explores the improvement of post-training quantization (PTQ) after knowledge distillation in the Whisper speech foundation model family. We address the challenge of outliers in weights and activation tensors, known to impede quantization quality in transformer-based language and vision models. Extending this observation to Whisper, we demonstrate that these outliers are also present when transformer-based models are trained to perform automatic speech recognition, necessitating mitigation strategies for PTQ. We show that outliers can be reduced by a recently proposed gating mechanism in the attention blocks of the student model, enabling effective 8-bit quantization, and lower word error rates compared to student models without the gating mechanism in place.
翻译:本文探讨了在Whisper语音基础模型系列中,知识蒸馏后如何改进后训练量化(PTQ)。我们解决了权重和激活张量中离群值带来的挑战,已知这些离群值会阻碍基于Transformer的语言和视觉模型的量化质量。将这一观察扩展到Whisper模型,我们证明了当基于Transformer的模型被训练用于执行自动语音识别时,这些离群值同样存在,因此需要针对PTQ的缓解策略。我们表明,通过在学生模型的注意力模块中引入最近提出的门控机制,可以有效减少离群值,从而实现有效的8位量化,并且与未采用门控机制的学生模型相比,获得了更低的词错误率。