In this paper, we propose an intra-set and inter-set recursive fusion framework with time-frequency calibrated knowledge distillation (I$^2$SRF-TFCKD) for SE. Different from previous distillation strategies for SE, the proposed framework fully exploits the time-frequency differential information of speech while facilitating both local information focusing and global knowledge circulation. Firstly, we construct a collaborative distillation paradigm for intra-set and inter-set correlations. Within a correlated set, multi-layer teacher-student features are pairwise matched for calibrated distillation. Subsequently, we generate representative features from each correlated set through recursive fusion to form the fused feature set that enables inter-set knowledge interaction. Secondly, we propose a multi-layer interactive distillation based on dual-stream time-frequency cross-calibration, which calculates the teacher-student similarity calibration weights in the time and frequency domains respectively and performs cross-weighting, thus enabling refined allocation of distillation contributions across different layers according to speech characteristics. The proposed distillation strategy is applied to the dual-path dilated convolutional recurrent network (DPDCRN) that ranked first in the SE track of the L3DAS23 challenge. To evaluate the effectiveness of I$^2$SRF-TFCKD, we conduct experiments on both single-channel and multi-channel SE datasets. Objective evaluations demonstrate that the proposed KD strategy consistently and effectively improves the performance of the low-complexity student model and outperforms other distillation schemes.
翻译:本文提出一种基于集内与集间递归融合框架的时频校准知识蒸馏方法(I²SRF-TFCKD),用于语音增强任务。与以往语音增强领域的蒸馏策略不同,该框架在促进局部信息聚焦与全局知识流通的同时,充分挖掘语音信号的时频差分信息。首先,我们构建了面向集内与集间相关性的协同蒸馏范式:在相关性集内,多层师生特征进行逐对匹配以实现校准蒸馏;随后,通过递归融合从每个相关性集中生成代表性特征,构建支持集间知识交互的融合特征集。其次,提出基于双流时频交叉校准的多层交互蒸馏方法,分别在时域和频域计算师生相似度校准权重并进行交叉加权,从而根据语音特征实现不同层蒸馏贡献的精细化分配。该蒸馏策略应用于在L3DAS23挑战赛语音增强赛道中获得第一名的双路径扩张卷积递归网络(DPDCRN)。为评估I²SRF-TFCKD的有效性,我们在单通道和多通道语音增强数据集上进行了实验。客观评估结果表明,所提知识蒸馏策略能够持续有效地提升低复杂度学生模型的性能,并优于其他蒸馏方案。