The development of text-to-image (T2I) generative models, that enable the creation of high-quality synthetic images from textual prompts, has opened new frontiers in creative design and content generation. However, this paper reveals a significant and previously unrecognized ethical risk inherent in this technology and introduces a novel method, termed the Cognitive Morphing Attack (CogMorph), which manipulates T2I models to generate images that retain the original core subjects but embeds toxic or harmful contextual elements. This nuanced manipulation exploits the cognitive principle that human perception of concepts is shaped by the entire visual scene and its context, producing images that amplify emotional harm far beyond attacks that merely preserve the original semantics. To address this, we first construct an imagery toxicity taxonomy spanning 10 major and 48 sub-categories, aligned with human cognitive-perceptual dimensions, and further build a toxicity risk matrix resulting in 1,176 high-quality T2I toxic prompts. Based on this, our CogMorph first introduces Cognitive Toxicity Augmentation, which develops a cognitive toxicity knowledge base with rich external toxic representations for humans (e.g., fine-grained visual features) that can be utilized to further guide the optimization of adversarial prompts. In addition, we present Contextual Hierarchical Morphing, which hierarchically extracts critical parts of the original prompt (e.g., scenes, subjects, and body parts), and then iteratively retrieves and fuses toxic features to inject harmful contexts. Extensive experiments on multiple open-sourced T2I models and black-box commercial APIs (e.g., DALLE-3) demonstrate the efficacy of CogMorph which significantly outperforms other baselines by large margins (+20.62% on average).
翻译:文本到图像(T2I)生成模型的发展,使得能够根据文本提示创建高质量的合成图像,为创意设计和内容生成开辟了新的前沿。然而,本文揭示了该技术中一个重大且先前未被认识的伦理风险,并引入了一种称为认知形变攻击(CogMorph)的新方法。该方法操纵T2I模型,使其生成保留原始核心主体但嵌入有毒或有害上下文元素的图像。这种微妙的操纵利用了人类对概念的感知由整个视觉场景及其上下文塑造的认知原理,所产生的图像在情感伤害上的放大效应远超仅保留原始语义的攻击。为解决此问题,我们首先构建了一个涵盖10个主要类别和48个子类别的图像毒性分类体系,该体系与人类认知-感知维度对齐,并进一步建立了一个毒性风险矩阵,生成了1,176个高质量的T2I毒性提示。基于此,我们的CogMorph首先引入了认知毒性增强,它构建了一个认知毒性知识库,其中包含丰富的、可供人类理解的外部毒性表征(例如,细粒度的视觉特征),这些表征可用于进一步指导对抗性提示的优化。此外,我们提出了上下文层次形变方法,该方法分层提取原始提示的关键部分(例如,场景、主体和身体部位),然后迭代地检索并融合毒性特征以注入有害上下文。在多个开源T2I模型和黑盒商业API(例如DALLE-3)上进行的大量实验证明了CogMorph的有效性,其性能显著优于其他基线方法(平均提升+20.62%)。