Current speech deepfake detection approaches perform satisfactorily against known adversaries; however, generalization to unseen attacks remains an open challenge. The proliferation of speech deepfakes on social media underscores the need for systems that can generalize to unseen attacks not observed during training. We address this problem from the perspective of meta-learning, aiming to learn attack-invariant features to adapt to unseen attacks with very few samples available. This approach is promising since generating of a high-scale training dataset is often expensive or infeasible. Our experiments demonstrated an improvement in the Equal Error Rate (EER) from 21.67% to 10.42% on the InTheWild dataset, using just 96 samples from the unseen dataset. Continuous few-shot adaptation ensures that the system remains up-to-date.
翻译:当前语音深度伪造检测方法对已知攻击者表现尚可,然而对未见攻击的泛化能力仍是一个开放挑战。社交媒体上语音深度伪造的泛滥凸显了需要能够泛化至训练期间未观测到的未见攻击的系统。我们从元学习的视角出发解决此问题,旨在学习攻击不变特征,以利用极少可用样本适应未见攻击。由于大规模训练数据集的生成通常成本高昂或不可行,该方法具有良好前景。我们的实验表明,在InTheWild数据集上,仅使用未见数据集的96个样本,等错误率(EER)从21.67%提升至10.42%。持续的少样本适应确保系统保持最新状态。