The rapid advancements in computer vision have stimulated remarkable progress in face forgery techniques, capturing the dedicated attention of researchers committed to detecting forgeries and precisely localizing manipulated areas. Nonetheless, with limited fine-grained pixel-wise supervision labels, deepfake detection models perform unsatisfactorily on precise forgery detection and localization. To address this challenge, we introduce the well-trained vision segmentation foundation model, i.e., Segment Anything Model (SAM) in face forgery detection and localization. Based on SAM, we propose the Detect Any Deepfakes (DADF) framework with the Multiscale Adapter, which can capture short- and long-range forgery contexts for efficient fine-tuning. Moreover, to better identify forged traces and augment the model's sensitivity towards forgery regions, Reconstruction Guided Attention (RGA) module is proposed. The proposed framework seamlessly integrates end-to-end forgery localization and detection optimization. Extensive experiments on three benchmark datasets demonstrate the superiority of our approach for both forgery detection and localization. The codes will be released soon at https://github.com/laiyingxin2/DADF.
翻译:计算机视觉的快速发展刺激了面部伪造技术的显著进步,吸引了研究人员致力于检测伪造并精确定位被篡改区域的专注关注。然而,由于缺乏细粒度的像素级监督标签,深度伪造检测模型在精确伪造检测与定位方面表现不佳。为应对这一挑战,我们将训练有素的视觉分割基础模型——即分割一切模型(SAM)引入面部伪造检测与定位中。基于SAM,我们提出了检测任何深度伪造(DADF)框架,并设计了多尺度适配器,能够捕获短程和长程伪造上下文以实现高效微调。此外,为更好地识别伪造痕迹并增强模型对伪造区域的敏感性,我们提出了重构引导注意力(RGA)模块。所提框架无缝集成了端到端的伪造定位与检测优化。在三个基准数据集上的大量实验证明了我们的方法在伪造检测与定位两方面的优越性。代码将很快发布在 https://github.com/laiyingxin2/DADF。