Current speech deepfake detection approaches perform satisfactorily against known adversaries; however, generalization to unseen attacks remains an open challenge. The proliferation of speech deepfakes on social media underscores the need for systems that can generalize to unseen attacks not observed during training. We address this problem from the perspective of meta-learning, aiming to learn attack-invariant features to adapt to unseen attacks with very few samples available. This approach is promising since generating of a high-scale training dataset is often expensive or infeasible. Our experiments demonstrated an improvement in the Equal Error Rate (EER) from 21.67% to 10.42% on the InTheWild dataset, using just 96 samples from the unseen dataset. Continuous few-shot adaptation ensures that the system remains up-to-date.
翻译:当前语音深度伪造检测方法在面对已知攻击时表现尚可,但泛化至未知攻击仍是一个开放挑战。社交媒体上语音深度伪造内容的激增凸显了系统需要具备泛化至训练阶段未观测到的未知攻击的能力。我们从元学习的角度出发解决该问题,旨在通过学习攻击不变特征,利用极少样本适应未知攻击。由于大规模训练数据集的生成通常成本高昂或难以实现,该方法具有显著优势。实验表明,在InTheWild数据集上,仅使用来自未知数据集的96个样本,等错误率(EER)从21.67%提升至10.42%。持续的少样本适应机制确保系统能够保持实时更新。