Naïve Exposure of Generative AI Capabilities Undermines Deepfake Detection

Generative AI systems increasingly expose powerful reasoning and image refinement capabilities through user-facing chatbot interfaces. In this work, we show that the naïve exposure of such capabilities fundamentally undermines modern deepfake detectors. Rather than proposing a new image manipulation technique, we study a realistic and already-deployed usage scenario in which an adversary uses only benign, policy-compliant prompts and commercial generative AI systems. We demonstrate that state-of-the-art deepfake detection methods fail under semantic-preserving image refinement. Specifically, we show that generative AI systems articulate explicit authenticity criteria and inadvertently externalize them through unrestricted reasoning, enabling their direct reuse as refinement objectives. As a result, refined images simultaneously evade detection, preserve identity as verified by commercial face recognition APIs, and exhibit substantially higher perceptual quality. Importantly, we find that widely accessible commercial chatbot services pose a significantly greater security risk than open-source models, as their superior realism, semantic controllability, and low-barrier interfaces enable effective evasion by non-expert users. Our findings reveal a structural mismatch between the threat models assumed by current detection frameworks and the actual capabilities of real-world generative AI. While detection baselines are largely shaped by prior benchmarks, deployed systems expose unrestricted authenticity reasoning and refinement despite stringent safety controls in other domains.

翻译：生成式AI系统通过面向用户的聊天机器人界面日益展现出强大的推理与图像精修能力。本研究证明，此类能力的朴素暴露从根本上破坏了现代深度伪造检测器的有效性。我们并未提出新的图像操纵技术，而是研究了一种现实且已部署的使用场景：攻击者仅使用良性的、符合策略的提示词与商业生成式AI系统。我们证明，在语义保持的图像精修条件下，最先进的深度伪造检测方法均告失效。具体而言，我们发现生成式AI系统能够明确表述真实性标准，并通过无限制的推理无意中将其外化，使得这些标准可直接被用作精修目标。因此，精修后的图像能够同时实现以下效果：逃避检测、保持身份（经商业人脸识别API验证），并展现出显著更高的感知质量。重要的是，我们发现广泛可访问的商业聊天机器人服务比开源模型构成更大的安全风险，因为其更优的真实感、语义可控性及低门槛界面使得非专业用户也能有效实施规避。我们的研究结果揭示了当前检测框架所假设的威胁模型与实际生成式AI能力之间的结构性错配。尽管检测基线主要受先前基准测试影响，但已部署的系统在其他领域严格安全控制下，仍暴露出无限制的真实性推理与精修能力。

相关内容

关注 7104

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

保护网络物理系统中的 AI 智能体：关于环境交互、深度伪造威胁及其防御技术的综述

专知会员服务

10+阅读 · 2月15日

生成式AI时代的深伪媒体生成与检测：综述与展望

专知会员服务

30+阅读 · 2024年12月2日

深度伪造生成与检测：基准测试和综述

专知会员服务

52+阅读 · 2024年3月27日

生成式AI：认知对抗的新武器

专知会员服务

83+阅读 · 2023年12月29日