Most research in synthetic speech detection (SSD) focuses on improving performance on standard noise-free datasets. However, in actual situations, noise interference is usually present, causing significant performance degradation in SSD systems. To improve noise robustness, this paper proposes a dual-branch knowledge distillation synthetic speech detection (DKDSSD) method. Specifically, a parallel data flow of the clean teacher branch and the noisy student branch is designed, and interactive fusion module and response-based teacher-student paradigms are proposed to guide the training of noisy data from both the data distribution and decision-making perspectives. In the noisy student branch, speech enhancement is introduced initially for denoising, aiming to reduce the interference of strong noise. The proposed interactive fusion combines denoised features and noisy features to mitigate the impact of speech distortion and ensure consistency with the data distribution of the clean branch. The teacher-student paradigm maps the student's decision space to the teacher's decision space, enabling noisy speech to behave similarly to clean speech. Additionally, a joint training method is employed to optimize both branches for achieving global optimality. Experimental results based on multiple datasets demonstrate that the proposed method performs effectively in noisy environments and maintains its performance in cross-dataset experiments. Source code is available at https://github.com/fchest/DKDSSD.
翻译:大多数合成语音检测研究集中在提升标准无噪声数据集上的性能。然而实际场景中通常存在噪声干扰,导致合成语音检测系统性能显著下降。为提升噪声鲁棒性,本文提出一种双分支知识蒸馏合成语音检测方法。具体而言,设计清洁教师分支与噪声学生分支的并行数据流,并引入交互融合模块与基于响应的师生范式,从数据分布和决策两个维度指导含噪数据的训练。在噪声学生分支中,首先引入语音增强进行降噪以降低强噪声干扰。所提出的交互融合将降噪特征与含噪特征相结合,既缓解语音失真影响,又确保与清洁分支的数据分布一致性。师生范式将学生的决策空间映射至教师的决策空间,使含噪语音呈现与清洁语音相似的特性。此外,采用联合训练方法对两个分支进行全局优化。基于多个数据集的实验结果表明,所提方法在噪声环境中表现优异,并在跨数据集实验中保持稳定性能。源代码见https://github.com/fchest/DKDSSD。