As a popular multilingual and multitask pre-trained speech model, Whisper has the problem of curse of multilinguality. To enhance multilingual capabilities in small Whisper models, we propose DQ-Whisper, a novel joint distillation and quantization framework to compress Whisper for efficient inference. Firstly, we propose a novel dynamic matching distillation strategy. Then, a quantization-aware distillation framework is introduced to integrate quantization with distillation. Experimental results on various multilingual datasets show that our suggested distillation approach can effectively enhance the multilingual capabilities of small Whisper models without increasing computational costs. Up to 5.18x reduction in model size is achieved with marginal performance degradation. In addition, quantization is compatible with distillation, which can result in a higher compression rate.
翻译:作为一种流行的多语言、多任务预训练语音模型,Whisper 存在“多语言诅咒”问题。为增强小型 Whisper 模型的多语言能力,我们提出 DQ-Whisper,一种新颖的联合蒸馏与量化框架,旨在压缩 Whisper 以实现高效推理。首先,我们提出一种新颖的动态匹配蒸馏策略。随后,引入量化感知蒸馏框架,将量化与蒸馏过程相集成。在多语言数据集上的实验结果表明,我们提出的蒸馏方法能有效增强小型 Whisper 模型的多语言能力,且不增加计算成本。模型尺寸最高可缩减至原来的 5.18 倍,而性能仅有轻微下降。此外,量化与蒸馏具有良好的兼容性,二者结合可实现更高的压缩率。