AI-Generated Image Detection using a Cross-Attention Enhanced Dual-Stream Network

With the rapid evolution of AI Generated Content (AIGC), forged images produced through this technology are inherently more deceptive and require less human intervention compared to traditional Computer-generated Graphics (CG). However, owing to the disparities between CG and AIGC, conventional CG detection methods tend to be inadequate in identifying AIGC-produced images. To address this issue, our research concentrates on the text-to-image generation process in AIGC. Initially, we first assemble two text-to-image databases utilizing two distinct AI systems, DALLE2 and DreamStudio. Aiming to holistically capture the inherent anomalies produced by AIGC, we develope a robust dual-stream network comprised of a residual stream and a content stream. The former employs the Spatial Rich Model (SRM) to meticulously extract various texture information from images, while the latter seeks to capture additional forged traces in low frequency, thereby extracting complementary information that the residual stream may overlook. To enhance the information exchange between these two streams, we incorporate a cross multi-head attention mechanism. Numerous comparative experiments are performed on both databases, and the results show that our detection method consistently outperforms traditional CG detection techniques across a range of image resolutions. Moreover, our method exhibits superior performance through a series of robustness tests and cross-database experiments. When applied to widely recognized traditional CG benchmarks such as SPL2018 and DsTok, our approach significantly exceeds the capabilities of other existing methods in the field of CG detection.

翻译：随着人工智能生成内容（AIGC）技术的快速发展，通过该技术产生的伪造图像相比传统计算机生成图像（CG）具有更高的欺骗性，且所需人工干预更少。然而，由于CG与AIGC之间存在差异，传统的CG检测方法往往难以有效识别AIGC生成的图像。为解决这一问题，本研究聚焦于AIGC中的文本到图像生成过程。首先，我们利用DALLE2和DreamStudio两种不同的AI系统构建了两个文本到图像数据库。为全面捕捉AIGC产生的固有异常，我们开发了一个由残差流和内容流组成的鲁棒双流网络。前者采用空间富模型（SRM）精细提取图像中的各类纹理信息，而后者则旨在捕获低频域中的额外伪造痕迹，从而提取残差流可能遗漏的互补信息。为增强两个流之间的信息交互，我们引入了交叉多头注意力机制。在两个数据库上开展了大量对比实验，结果表明，我们的检测方法在各种图像分辨率下均持续优于传统CG检测技术。此外，通过一系列鲁棒性测试和跨数据库实验，我们的方法展现出卓越性能。当应用于SPL2018和DsTok等广泛认可的传统CG基准测试时，我们的方法显著超越了CG检测领域其他现有方法的能力。