$PC^2$：基于GPT的文生图模型越狱攻击生成政治争议内容 ($PC^2$: Politically Controversial Content Generation via Jailbreaking Attacks on GPT-based Text-to-Image Models)

The rapid evolution of text-to-image (T2I) models has enabled high-fidelity visual synthesis on a global scale. However, these advancements have introduced significant security risks, particularly regarding the generation of harmful content. Politically harmful content, such as fabricated depictions of public figures, poses severe threats when weaponized for fake news or propaganda. Despite its criticality, the robustness of current T2I safety filters against such politically motivated adversarial prompting remains underexplored. In response, we propose $PC^2$, the first black-box political jailbreaking framework for T2I models. It exploits a novel vulnerability where safety filters evaluate political sensitivity based on linguistic context. $PC^2$ operates through: (1) Identity-Preserving Descriptive Mapping to obfuscate sensitive keywords into neutral descriptions, and (2) Geopolitically Distal Translation to map these descriptions into fragmented, low-sensitivity languages. This strategy prevents filters from constructing toxic relationships between political entities within prompts, effectively bypassing detection. We construct a benchmark of 240 politically sensitive prompts involving 36 public figures. Evaluation on commercial T2I models, specifically GPT-series, shows that while all original prompts are blocked, $PC^2$ achieves attack success rates of up to 86%.

翻译：文生图模型的快速发展使得高保真度的视觉合成在全球范围内得以实现。然而，这些进步也带来了重大的安全风险，特别是在有害内容生成方面。政治有害内容，例如对公众人物的捏造描绘，一旦被武器化用于假新闻或宣传，便会构成严重威胁。尽管其至关重要，但目前文生图安全过滤器针对此类具有政治动机的对抗性提示的鲁棒性仍未得到充分探索。为此，我们提出了$PC^2$，这是首个针对文生图模型的黑盒政治越狱框架。它利用了一种新颖的漏洞，即安全过滤器基于语言上下文来评估政治敏感性。$PC^2$通过以下方式运作：(1) 身份保持描述映射，将敏感关键词模糊化为中性描述；以及(2) 地缘政治远端翻译，将这些描述映射到碎片化、低敏感性的语言中。该策略阻止过滤器在提示词内部构建政治实体之间的有害关联，从而有效绕过检测。我们构建了一个包含涉及36位公众人物的240个政治敏感提示词的基准。在商用文生图模型（特别是GPT系列）上的评估表明，虽然所有原始提示词均被拦截，但$PC^2$的攻击成功率最高可达86%。

相关内容

图模型

关注 31

图模型由点和线组成的用以描述系统的图形。图模型属于结构模型（见模型），可用于描述自然界和人类社会中的大量事物和事物之间的关系。在建模中采用图模型可利用图论作为工具。按图的性质进行分析为研究各种系统特别是复杂系统提供了一种有效的方法。构成图模型的图形不同于一般的几何图形。例如，它的每条边可以被赋以权，组成加权图。权可取一定数值，用以表示距离、流量、费用等。加权图可用于研究电网络、运输网络、通信网络以及运筹学中的一些重要课题。图模型广泛应用于自然科学、工程技术、社会经济和管理等方面。见动态结构图、信号流程图、计划协调技术、图解协调技术、风险协调技术、网络技术、网络理论。

【NTU博士论文】视频生成新突破：从人脸说话视频到通用视频制作

专知会员服务

16+阅读 · 1月15日

【ICCV2025】AIGI-Holmes：面向可解释性与可泛化性的AI生成图像检测方法 —— 基于多模态大语言模型的研究

专知会员服务

10+阅读 · 2025年7月4日

《生成人工智能对抗性使用对国土安全的影响》美国土安全部最新99页报告

专知会员服务

21+阅读 · 2025年1月21日

《战争和恐怖主义中的视觉生成式人工智能》最新138页

专知会员服务

29+阅读 · 2025年1月4日