In speech enhancement, knowledge distillation (KD) compresses models by transferring a high-capacity teacher's knowledge to a compact student. However, conventional KD methods train the student to mimic the teacher's output entirely, which forces the student to imitate the regions where the teacher performs poorly and to apply distillation to the regions where the student already performs well, which yields only marginal gains. We propose Distilling Selective Patches (DISPatch), a KD framework for speech enhancement that applies the distillation loss to spectrogram patches where the teacher outperforms the student, as determined by a Knowledge Gap Score. This approach guides optimization toward areas with the most significant potential for student improvement while minimizing the influence of regions where the teacher may provide unreliable instruction. Furthermore, we introduce Multi-Scale Selective Patches (MSSP), a frequency-dependent method that uses different patch sizes across low- and high-frequency bands to account for spectral heterogeneity. We incorporate DISPatch into conventional KD methods and observe consistent gains in compact students. Moreover, integrating DISPatch and MSSP into a state-of-the-art frequency-dependent KD method considerably improves performance across all metrics.
翻译:在语音增强任务中,知识蒸馏(KD)通过将高容量教师模型的知识迁移到紧凑的学生模型来实现模型压缩。然而,传统的知识蒸馏方法训练学生模型完全模仿教师模型的输出,这迫使学生模型去模仿教师表现较差的区域,并对学生已经表现良好的区域也进行蒸馏,从而仅带来有限的性能提升。我们提出了选择性频谱块蒸馏(DISPatch),一种用于语音增强的知识蒸馏框架,它仅将蒸馏损失应用于那些由知识差距分数判定为教师模型表现优于学生模型的频谱块上。这种方法将优化过程引导至学生模型最具改进潜力的区域,同时最小化教师模型可能提供不可靠指导的区域的影响。此外,我们提出了多尺度选择性频谱块(MSSP),这是一种频率相关的方法,它在低频和高频子带使用不同大小的频谱块,以考虑频谱的异质性。我们将DISPatch集成到传统的知识蒸馏方法中,观察到紧凑学生模型获得了持续的性能提升。此外,将DISPatch和MSSP集成到一种先进的频率相关知识蒸馏方法中,显著提升了所有评估指标上的性能。