Deep learning-based speech enhancement (SE) models have recently outperformed traditional techniques, yet their deployment on resource-constrained devices remains challenging due to high computational and memory demands. This paper introduces a novel dynamic frequency-adaptive knowledge distillation (DFKD) approach to effectively compress SE models. Our method dynamically assesses the model's output, distinguishing between high and low-frequency components, and adapts the learning objectives to meet the unique requirements of different frequency bands, capitalizing on the SE task's inherent characteristics. To evaluate the DFKD's efficacy, we conducted experiments on three state-of-the-art models: DCCRN, ConTasNet, and DPTNet. The results demonstrate that our method not only significantly enhances the performance of the compressed model (student model) but also surpasses other logit-based knowledge distillation methods specifically for SE tasks.
翻译:基于深度学习的语音增强模型近年来已超越传统技术,但由于其高计算和内存需求,在资源受限设备上的部署仍具挑战性。本文提出了一种新颖的动态频率自适应知识蒸馏方法,以有效压缩语音增强模型。该方法动态评估模型输出,区分高频与低频分量,并自适应调整学习目标以满足不同频带的独特需求,从而充分利用语音增强任务的内在特性。为评估动态频率自适应知识蒸馏的有效性,我们在三种先进模型上进行了实验:DCCRN、ConTasNet和DPTNet。结果表明,我们的方法不仅显著提升了压缩模型的性能,还超越了其他专门针对语音增强任务的基于逻辑值的知识蒸馏方法。