Blind detection of the forged regions in digital images is an effective authentication means to counter the malicious use of local image editing techniques. Existing encoder-decoder forensic networks overlook the fact that detecting complex and subtle tampered regions typically requires more feedback information. In this paper, we propose a Progressive FeedbACk-enhanced Transformer (ProFact) network to achieve coarse-to-fine image forgery localization. Specifically, the coarse localization map generated by an initial branch network is adaptively fed back to the early transformer encoder layers for enhancing the representation of positive features while suppressing interference factors. The cascaded transformer network, combined with a contextual spatial pyramid module, is designed to refine discriminative forensic features for improving the forgery localization accuracy and reliability. Furthermore, we present an effective strategy to automatically generate large-scale forged image samples close to real-world forensic scenarios, especially in realistic and coherent processing. Leveraging on such samples, a progressive and cost-effective two-stage training protocol is applied to the ProFact network. The extensive experimental results on nine public forensic datasets show that our proposed localizer greatly outperforms the state-of-the-art on the generalization ability and robustness of image forgery localization. Code will be publicly available at https://github.com/multimediaFor/ProFact.
翻译:数字图像中伪造区域的盲检测是对抗局部图像编辑技术恶意使用的有效认证手段。现有编码器-解码器取证网络忽视了检测复杂且细微篡改区域通常需要更多反馈信息这一事实。本文提出一种渐进反馈增强的Transformer(ProFact)网络,实现从粗到细的图像伪造定位。具体而言,初始分支网络生成的粗定位图被自适应地反馈至早期Transformer编码器层,以增强正类特征表示并抑制干扰因素。级联Transformer网络结合上下文空间金字塔模块,旨在优化鉴别性取证特征,从而提高伪造定位的准确性和可靠性。此外,我们提出一种有效策略,能够自动生成接近真实取证场景的大规模伪造图像样本,尤其在真实性和连贯性处理方面。借助此类样本,对ProFact网络应用渐进且成本有效的两阶段训练协议。在九个公开取证数据集上的大量实验结果表明,本文提出的定位器在图像伪造定位的泛化能力和鲁棒性上显著优于现有最优方法。代码将在https://github.com/multimediaFor/ProFact公开。