The availability of highly convincing audio deepfake generators highlights the need for designing robust audio deepfake detectors. Existing works often rely solely on real and fake data available in the training set, which may lead to overfitting, thereby reducing the robustness to unseen manipulations. To enhance the generalization capabilities of audio deepfake detectors, we propose a novel augmentation method for generating audio pseudo-fakes targeting the decision boundary of the model. Inspired by adversarial attacks, we perturb original real data to synthesize pseudo-fakes with ambiguous prediction probabilities. Comprehensive experiments on two well-known architectures demonstrate that the proposed augmentation contributes to improving the generalization capabilities of these architectures.
翻译:高度逼真的音频深度伪造生成器的出现,凸显了设计鲁棒的音频深度伪造检测器的必要性。现有工作通常仅依赖于训练集中可用的真实和伪造数据,这可能导致过拟合,从而降低对未见篡改操作的鲁棒性。为了增强音频深度伪造检测器的泛化能力,我们提出了一种新颖的增强方法,用于生成针对模型决策边界的音频伪伪造样本。受对抗性攻击的启发,我们对原始真实数据进行扰动,以合成具有模糊预测概率的伪伪造样本。在两个知名架构上的综合实验表明,所提出的增强方法有助于提升这些架构的泛化能力。