TGIF: Text-Guided Inpainting Forgery Dataset

Digital image manipulation has become increasingly accessible and realistic with the advent of generative AI technologies. Recent developments allow for text-guided inpainting, making sophisticated image edits possible with minimal effort. This poses new challenges for digital media forensics. For example, diffusion model-based approaches could either splice the inpainted region into the original image, or regenerate the entire image. In the latter case, traditional image forgery localization (IFL) methods typically fail. This paper introduces the Text-Guided Inpainting Forgery (TGIF) dataset, a comprehensive collection of images designed to support the training and evaluation of image forgery localization and synthetic image detection (SID) methods. The TGIF dataset includes approximately 75k forged images, originating from popular open-source and commercial methods, namely SD2, SDXL, and Adobe Firefly. We benchmark several state-of-the-art IFL and SID methods on TGIF. Whereas traditional IFL methods can detect spliced images, they fail to detect regenerated inpainted images. Moreover, traditional SID may detect the regenerated inpainted images to be fake, but cannot localize the inpainted area. Finally, both IFL and SID methods fail when exposed to stronger compression, while they are less robust to modern compression algorithms, such as WEBP. In conclusion, this work demonstrates the inefficiency of state-of-the-art detectors on local manipulations performed by modern generative approaches, and aspires to help with the development of more capable IFL and SID methods. The dataset and code can be downloaded at https://github.com/IDLabMedia/tgif-dataset.

翻译：随着生成式人工智能技术的出现，数字图像篡改已变得日益便捷且逼真。近期的进展使得文本引导的图像修复成为可能，从而能以极小的努力实现复杂的图像编辑。这为数字媒体取证带来了新的挑战。例如，基于扩散模型的方法既可以将修复区域拼接到原始图像中，也可以重新生成整幅图像。在后一种情况下，传统的图像伪造定位方法通常会失效。本文介绍了文本引导修复伪造数据集，这是一个全面的图像集合，旨在支持图像伪造定位和合成图像检测方法的训练与评估。TGIF数据集包含约7.5万张伪造图像，源自流行的开源和商业方法，即SD2、SDXL和Adobe Firefly。我们在TGIF上对多种先进的IFL和SID方法进行了基准测试。传统的IFL方法能够检测拼接图像，但无法检测重新生成的修复图像。此外，传统的SID方法可能将重新生成的修复图像检测为伪造，但无法定位修复区域。最后，当面临更强的压缩时，IFL和SID方法均告失败，而它们对现代压缩算法（如WEBP）的鲁棒性也较差。总之，这项工作展示了现有先进检测器在面对现代生成方法执行的局部篡改时的低效性，并期望有助于开发更强大的IFL和SID方法。数据集和代码可通过https://github.com/IDLabMedia/tgif-dataset下载。