Deepfake videos and images are becoming increasingly credible, posing a significant threat given their potential to facilitate fraud or bypass access control systems. This has motivated the development of deepfake detection methods, in which deep learning models are trained to distinguish between real and synthesized footage. Unfortunately, existing detection models struggle to generalize to deepfakes from datasets they were not trained on, but little work has been done to examine why or how this limitation can be addressed. In this paper, we present the first empirical study on the generalizability of deepfake detectors, an essential goal for detectors to stay one step ahead of attackers. Our study utilizes six deepfake datasets, five deepfake detection methods, and two model augmentation approaches, confirming that detectors do not generalize in zero-shot settings. Additionally, we find that detectors are learning unwanted properties specific to synthesis methods and struggling to extract discriminative features, limiting their ability to generalize. Finally, we find that there are neurons universally contributing to detection across seen and unseen datasets, illuminating a possible path forward to zero-shot generalizability.
翻译:深度伪造视频和图像正变得日益可信,因其可能助长欺诈或绕过访问控制系统,构成了重大威胁。这促使了深度伪造检测方法的发展,其中深度学习模型被训练用于区分真实与合成素材。遗憾的是,现有检测模型难以泛化至其未训练过的数据集中的深度伪造内容,但关于为何会出现此局限以及如何解决的研究尚不充分。本文首次对深度伪造检测器的泛化能力进行实证研究,这是检测器保持对攻击者领先优势的关键目标。我们的研究涵盖六个深度伪造数据集、五种深度伪造检测方法及两种模型增强方法,证实了检测器在零样本场景下无法泛化。此外,我们发现检测器学习到了特定于合成方法的不必要属性,且在提取判别性特征方面存在困难,这限制了其泛化能力。最后,我们发现在已见与未见数据集中存在对检测有普遍贡献的神经元,这为零样本泛化能力的实现指明了一条可能的路径。