Text-to-image (T2I) diffusion models are widely adopted for their strong generative capabilities, yet remain vulnerable to backdoor attacks. Existing attacks typically rely on fixed textual triggers and single-entity backdoor targets, making them highly susceptible to enumeration-based input defenses and attention-consistency detection. In this work, we propose Semantic-level Backdoor Attack (SemBD), which implants backdoors at the representation level by defining triggers as continuous semantic regions rather than discrete textual patterns. Concretely, SemBD injects semantic backdoors by distillation-based editing of the key and value projection matrices in cross-attention layers, enabling diverse prompts with identical semantic compositions to reliably activate the backdoor attack. To further enhance stealthiness, SemBD incorporates a semantic regularization to prevent unintended activation under incomplete semantics, as well as multi-entity backdoor targets that avoid highly consistent cross-attention patterns. Extensive experiments demonstrate that SemBD achieves a 100% attack success rate while maintaining strong robustness against state-of-the-art input-level defenses.
翻译:文本到图像(T2I)扩散模型因其强大的生成能力而被广泛采用,但仍易受后门攻击。现有攻击通常依赖于固定的文本触发器和单实体后门目标,使其极易受到基于枚举的输入防御和注意力一致性检测的影响。在本研究中,我们提出语义级后门攻击(SemBD),该方法通过在表示层面植入后门,将触发器定义为连续的语义区域而非离散的文本模式。具体而言,SemBD通过基于蒸馏的方式编辑交叉注意力层中的键和值投影矩阵来注入语义后门,使得具有相同语义构成的不同提示都能可靠地激活后门攻击。为进一步增强隐蔽性,SemBD引入了语义正则化以防止在不完整语义下的意外激活,并采用多实体后门目标以避免高度一致的交叉注意力模式。大量实验表明,SemBD在保持对最先进输入级防御强鲁棒性的同时,实现了100%的攻击成功率。