Most research in fake audio detection (FAD) focuses on improving performance on standard noise-free datasets. However, in actual situations, there is usually noise interference, which will cause significant performance degradation in FAD systems. To improve the noise robustness, we propose a dual-branch knowledge distillation fake audio detection (DKDFAD) method. Specifically, a parallel data flow of the clean teacher branch and the noisy student branch is designed, and interactive fusion and response-based teacher-student paradigms are proposed to guide the training of noisy data from the data distribution and decision-making perspectives. In the noise branch, speech enhancement is first introduced for denoising, which reduces the interference of strong noise. The proposed interactive fusion combines denoising features and noise features to reduce the impact of speech distortion and seek consistency with the data distribution of clean branch. The teacher-student paradigm maps the student's decision space to the teacher's decision space, making noisy speech behave as clean. In addition, a joint training method is used to optimize the two branches to achieve global optimality. Experimental results based on multiple datasets show that the proposed method performs well in noisy environments and maintains performance in cross-dataset experiments.
翻译:伪造音频检测(FAD)领域的大多数研究集中于提升标准无噪声数据集上的性能。然而实际场景中通常存在噪声干扰,这将导致FAD系统性能显著下降。为提升噪声鲁棒性,我们提出一种双分支知识蒸馏伪造音频检测(DKDFAD)方法。具体而言,设计了纯净教师分支与含噪学生分支构成的并行数据流,并提出交互融合与基于响应的师生范式,从数据分布和决策层面引导含噪数据的训练。在噪声分支中,首先引入语音增强进行去噪,以降低强噪声干扰。所提出的交互融合融合去噪特征与噪声特征,可减少语音失真影响并寻求与纯净分支数据分布的一致性。师生范式将学生的决策空间映射至教师的决策空间,使含噪语音表现出纯净语音的行为。此外,采用联合训练方法优化两个分支以实现全局最优。基于多个数据集的实验结果表明,所提方法在噪声环境下性能优异,且在跨数据集实验中保持了稳定表现。