With the advent of publicly available AI-based text-to-image systems, the process of creating photorealistic but fully synthetic images has been largely democratized. This can pose a threat to the public through a simplified spread of disinformation. Machine detectors and human media expertise can help to differentiate between AI-generated (fake) and real images and counteract this danger. Although AI generation models are highly prompt-dependent, the impact of the prompt on the fake detection performance has rarely been investigated yet. This work therefore examines the influence of the prompt's level of detail on the detectability of fake images, both with an AI detector and in a user study. For this purpose, we create a novel dataset, COCOXGEN, which consists of real photos from the COCO dataset as well as images generated with SDXL and Fooocus using prompts of two standardized lengths. Our user study with 200 participants shows that images generated with longer, more detailed prompts are detected significantly more easily than those generated with short prompts. Similarly, an AI-based detection model achieves better performance on images generated with longer prompts. However, humans and AI models seem to pay attention to different details, as we show in a heat map analysis.
翻译:随着基于人工智能的文本到图像系统向公众开放,创建逼真但完全合成的图像的过程已基本实现大众化。这可能通过简化虚假信息的传播而对公众构成威胁。机器检测器和人类媒体专业知识可以帮助区分人工智能生成的(虚假)图像和真实图像,从而应对这种危险。尽管人工智能生成模型高度依赖于提示词,但提示词对虚假检测性能的影响尚未得到充分研究。因此,本研究考察了提示词详细程度对虚假图像可检测性的影响,包括通过人工智能检测器和用户研究两种方式。为此,我们创建了一个新颖的数据集COCOXGEN,该数据集包含来自COCO数据集的真实照片,以及使用两种标准化长度提示词通过SDXL和Fooocus生成的图像。我们针对200名参与者的用户研究表明,使用更长、更详细的提示词生成的图像明显比使用简短提示词生成的图像更容易被检测出来。类似地,基于人工智能的检测模型在使用更长提示词生成的图像上也表现出更好的性能。然而,正如我们在热图分析中所展示的,人类和人工智能模型似乎关注不同的细节。