Text-to-image diffusion models are increasingly developed through open-source reuse and repeated downstream fine-tuning, where reused checkpoints are difficult to verify and thus more susceptible to hidden backdoor behaviors. In such ecosystems, a single pretrained model may be sequentially adapted and redistributed by multiple independent parties, allowing multiple concept-specific trigger-target associations to accumulate in the same model. When these associations coexist, semantic conflicts can be amplified in the shared representation space, leading to cross-concept entanglement and degraded generation quality. Notably, instead of strengthening the attack, such accumulation can destabilize previously injected behaviors and reduce attack reliability. In this work, we systematically investigate backdoor attacks under this interference-prone setting and propose Hydra, a unified framework for robust and controlled multi-concept backdoor injection under cumulative and decentralized reuse. Our core insight is that stable backdoor injection under large-scale multi-concept settings requires explicitly constraining trigger semantics while coordinating cross-task interactions during optimization. Specifically, Hydra performs evolutionary trigger search in the text encoder space to identify triggers that are semantically aligned with their target concepts while remaining stable across other injected concepts. It further combines multi-task fine-tuning with trigger-clean regularization to improve training stability under dense multi-concept injection. Extensive experiments across multiple diffusion backbones under rigorous multi-concept settings show that Hydra maintains effective backdoor activation while preserving clean generation fidelity and image quality. For instance, across 8 attackers and 500 concept pairs, Hydra maintains ~95% ASR and strong clean generation.
翻译:文本到图像扩散模型日益通过开源复用和重复的下游微调进行开发,其中复用的检查点难以验证,因此更容易隐藏后门行为。在此类生态系统中,单个预训练模型可能由多个独立方依次适配并重新分发,导致多个特定概念的触发-目标关联在同一模型中累积。当这些关联共存时,共享表示空间中的语义冲突可能被放大,导致跨概念纠缠并降低生成质量。值得注意的是,这种累积并非增强攻击,反而可能破坏先前注入的行为并降低攻击可靠性。在本工作中,我们系统研究了这种干扰易发环境下的后门攻击,并提出Hydra——一个在累积性和分散性复用场景下实现鲁棒可控多概念后门注入的统一框架。我们的核心见解是,在大规模多概念场景下实现稳定后门注入需要显式约束触发语义,同时在优化过程中协调跨任务交互。具体而言,Hydra在文本编码器空间执行演化式触发搜索,以识别与目标概念语义对齐且在其他注入概念下保持稳定的触发。它进一步结合多任务微调与触发-清洁正则化,以提升密集多概念注入下的训练稳定性。在多个扩散主干网络上,基于严格多概念设置的广泛实验表明,Hydra在保持有效后门激活的同时,能维持干净的生成保真度和图像质量。例如,在8个攻击者和500个概念对下,Hydra保持了约95%的攻击成功率与强大的干净生成能力。