Deepfake has taken the world by storm, triggering a trust crisis. Current deepfake detection methods are typically inadequate in generalizability, with a tendency to overfit to image contents such as the background, which are frequently occurring but relatively unimportant in the training dataset. Furthermore, current methods heavily rely on a few dominant forgery regions and may ignore other equally important regions, leading to inadequate uncovering of forgery cues. In this paper, we strive to address these shortcomings from three aspects: (1) We propose an innovative two-stream network that effectively enlarges the potential regions from which the model extracts forgery evidence. (2) We devise three functional modules to handle the multi-stream and multi-scale features in a collaborative learning scheme. (3) Confronted with the challenge of obtaining forgery annotations, we propose a Semi-supervised Patch Similarity Learning strategy to estimate patch-level forged location annotations. Empirically, our method demonstrates significantly improved robustness and generalizability, outperforming previous methods on six benchmarks, and improving the frame-level AUC on Deepfake Detection Challenge preview dataset from 0.797 to 0.835 and video-level AUC on CelebDF$\_$v1 dataset from 0.811 to 0.847. Our implementation is available at https://github.com/sccsok/Locate-and-Verify.
翻译:深度伪造技术席卷全球,引发了信任危机。当前的深度伪造检测方法通常泛化能力不足,容易过拟合到训练数据集中频繁出现但相对次要的图像内容(如背景)。此外,现有方法过度依赖少数几个主导性伪造区域,可能忽略其他同等重要的区域,导致对伪造线索的挖掘不充分。本文致力于从三个方面解决这些缺陷:(1)提出一种创新的双流网络,有效扩大模型提取伪造证据的潜在区域;(2)设计三个功能模块,以协作学习方式处理多流和多尺度特征;(3)针对伪造标注获取的挑战,提出半监督补丁相似性学习策略来估计补丁级伪造位置标注。实验表明,我们的方法在鲁棒性和泛化性方面显著提升,在六个基准数据集上优于此前方法,并将Deepfake Detection Challenge预览数据集的帧级AUC从0.797提升至0.835,CelebDF$\_$v1数据集的视频级AUC从0.811提升至0.847。我们的实现代码已开源至https://github.com/sccsok/Locate-and-Verify。