The extraordinary ability of generative models emerges as a new trend in image editing and generating realistic images, posing a serious threat to the trustworthiness of multimedia data and driving the research of image manipulation detection and location(IMDL). However, the lack of a large-scale data foundation makes IMDL task unattainable. In this paper, a local manipulation pipeline is designed, incorporating the powerful SAM, ChatGPT and generative models. Upon this basis, We propose the GIM dataset, which has the following advantages: 1) Large scale, including over one million pairs of AI-manipulated images and real images. 2) Rich Image Content, encompassing a broad range of image classes 3) Diverse Generative Manipulation, manipulated images with state-of-the-art generators and various manipulation tasks. The aforementioned advantages allow for a more comprehensive evaluation of IMDL methods, extending their applicability to diverse images. We introduce two benchmark settings to evaluate the generalization capability and comprehensive performance of baseline methods. In addition, we propose a novel IMDL framework, termed GIMFormer, which consists of a ShadowTracer, Frequency-Spatial Block (FSB), and a Multi-window Anomalous Modelling (MWAM) Module. Extensive experiments on the GIM demonstrate that GIMFormer surpasses previous state-of-the-art works significantly on two different benchmarks.
翻译:生成模型在图像编辑和生成逼真图像方面展现出的非凡能力,正成为新兴趋势,这对多媒体数据的可信度构成严重威胁,并推动了图像篡改检测与定位(IMDL)的研究。然而,由于缺乏大规模数据基础,IMDL任务难以有效开展。本文设计了一种局部篡改生成流程,整合了强大的SAM、ChatGPT以及生成模型。在此基础上,我们提出了GIM数据集,该数据集具备以下优势:1)规模庞大,包含超过一百万对AI篡改图像与真实图像。2)图像内容丰富,涵盖广泛的图像类别。3)生成式篡改方式多样,采用最先进的生成器及多种篡改任务生成篡改图像。上述优势使得对IMDL方法的评估更为全面,并将其适用性扩展到多样化的图像场景。我们引入了两种基准测试设置,以评估基线方法的泛化能力和综合性能。此外,我们提出了一种新颖的IMDL框架,称为GIMFormer,它由ShadowTracer、频率-空间块(FSB)以及多窗口异常建模(MWAM)模块组成。在GIM数据集上的大量实验表明,GIMFormer在两个不同的基准测试上均显著超越了以往最先进的方法。