Tiny, causal models are crucial for embedded audio machine learning applications. Model compression can be achieved via distilling knowledge from a large teacher into a smaller student model. In this work, we propose a novel two-step approach for tiny speech enhancement model distillation. In contrast to the standard approach of a weighted mixture of distillation and supervised losses, we firstly pre-train the student using only the knowledge distillation (KD) objective, after which we switch to a fully supervised training regime. We also propose a novel fine-grained similarity-preserving KD loss, which aims to match the student's intra-activation Gram matrices to that of the teacher. Our method demonstrates broad improvements, but particularly shines in adverse conditions including high compression and low signal to noise ratios (SNR), yielding signal to distortion ratio gains of 0.9 dB and 1.1 dB, respectively, at -5 dB input SNR and 63x compression compared to baseline.
翻译:微型因果模型对于嵌入式音频机器学习应用至关重要。模型压缩可通过将知识从大型教师模型蒸馏至更小的学生模型来实现。本研究提出一种新颖的两步方法用于微型语音增强模型蒸馏。与标准方法中结合知识蒸馏损失和监督损失的加权混合不同,我们首先仅使用知识蒸馏目标预训练学生模型,随后切换至完全监督训练模式。此外,我们提出一种新颖的细粒度相似性保持知识蒸馏损失函数,旨在使学生模型内部的激活Gram矩阵与教师模型相匹配。该方法在广泛场景中展现出性能提升,尤其在高度压缩和低信噪比等不利条件下表现突出:在输入信噪比为-5 dB且压缩率达63倍时,相较于基线方法,信号失真比增益分别提升0.9 dB和1.1 dB。