Awakening the Hydra: Stabilizing Multi-Concept Backdoor Injection in Text-to-Image Diffusion Models

Text-to-image diffusion models are increasingly developed through open-source reuse and repeated downstream fine-tuning, where reused checkpoints are difficult to verify and thus more susceptible to hidden backdoor behaviors. In such ecosystems, a single pretrained model may be sequentially adapted and redistributed by multiple independent parties, allowing multiple concept-specific trigger-target associations to accumulate in the same model. When these associations coexist, semantic conflicts can be amplified in the shared representation space, leading to cross-concept entanglement and degraded generation quality. Notably, instead of strengthening the attack, such accumulation can destabilize previously injected behaviors and reduce attack reliability. In this work, we systematically investigate backdoor attacks under this interference-prone setting and propose Hydra, a unified framework for robust and controlled multi-concept backdoor injection under cumulative and decentralized reuse. Our core insight is that stable backdoor injection under large-scale multi-concept settings requires explicitly constraining trigger semantics while coordinating cross-task interactions during optimization. Specifically, Hydra performs evolutionary trigger search in the text encoder space to identify triggers that are semantically aligned with their target concepts while remaining stable across other injected concepts. It further combines multi-task fine-tuning with trigger-clean regularization to improve training stability under dense multi-concept injection. Extensive experiments across multiple diffusion backbones under rigorous multi-concept settings show that Hydra maintains effective backdoor activation while preserving clean generation fidelity and image quality. For instance, across 8 attackers and 500 concept pairs, Hydra maintains ~95% ASR and strong clean generation.

翻译：文本到图像扩散模型日益通过开源复用和重复的下游微调进行开发，其中复用的检查点难以验证，因此更容易隐藏后门行为。在此类生态系统中，单个预训练模型可能由多个独立方依次适配并重新分发，导致多个特定概念的触发-目标关联在同一模型中累积。当这些关联共存时，共享表示空间中的语义冲突可能被放大，导致跨概念纠缠并降低生成质量。值得注意的是，这种累积并非增强攻击，反而可能破坏先前注入的行为并降低攻击可靠性。在本工作中，我们系统研究了这种干扰易发环境下的后门攻击，并提出Hydra——一个在累积性和分散性复用场景下实现鲁棒可控多概念后门注入的统一框架。我们的核心见解是，在大规模多概念场景下实现稳定后门注入需要显式约束触发语义，同时在优化过程中协调跨任务交互。具体而言，Hydra在文本编码器空间执行演化式触发搜索，以识别与目标概念语义对齐且在其他注入概念下保持稳定的触发。它进一步结合多任务微调与触发-清洁正则化，以提升密集多概念注入下的训练稳定性。在多个扩散主干网络上，基于严格多概念设置的广泛实验表明，Hydra在保持有效后门激活的同时，能维持干净的生成保真度和图像质量。例如，在8个攻击者和500个概念对下，Hydra保持了约95%的攻击成功率与强大的干净生成能力。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

144页ppt《扩散模型》，Google DeepMind Sander Dieleman

专知会员服务

51+阅读 · 2025年11月21日

【NeurIPS2025】Seg4Diff：揭示文本到图像扩散 Transformer 中的开放词汇分割

专知会员服务

10+阅读 · 2025年9月23日

训练扩散模型比你想象的更简单！谢赛宁老师：Representation matters！

专知会员服务

21+阅读 · 2024年10月25日

《可信文本到图像扩散模型》最新综述

专知会员服务

27+阅读 · 2024年9月30日