While diffusion models have revolutionized visual content generation, their rapid adoption has underscored the critical need to investigate vulnerabilities, e.g., to backdoor attacks. In multimodal diffusion models, it is natural to expect that attacking multiple modalities simultaneously (e.g., text and image) would yield complementary effects and strengthen the overall backdoor. In this paper, we challenge this assumption by investigating the phenomenon of Backdoor Modality Collapse, a scenario where the backdoor mechanism degenerates to rely predominantly on a subset of modalities, rendering others redundant. To rigorously quantify this behavior, we introduce two novel metrics: Trigger Modality Attribution (TMA) and Cross-Trigger Interaction (CTI). Through extensive experiments across diverse training configurations in multimodal conditional diffusion, we consistently observe a ``winner-takes-all'' dynamic in backdoor behavior. Our results reveal that (1) attacks often collapse into subset-modality dominance, and (2) cross-modal interaction is negligible or even negative, contradicting the intuition of synergistic vulnerability. These findings highlight a critical blind spot in current assessments, suggesting that high attack success rates often mask a fundamental reliance on a subset of modalities. This establishes a principled foundation for mechanistic analysis and future defense development.
翻译:尽管扩散模型已经彻底改变了视觉内容生成,其迅速普及也凸显了研究其脆弱性(例如后门攻击)的迫切需求。在多模态扩散模型中,人们很自然地预期同时攻击多个模态(例如文本和图像)会产生互补效应并增强整体后门效果。本文挑战了这一假设,通过研究后门模态坍缩现象——即后门机制退化为主要依赖于部分模态,而其他模态变得冗余。为严格量化此行为,我们引入了两个新指标:触发模态归因和跨触发交互。通过对多模态条件扩散中多样化训练配置的广泛实验,我们一致观察到后门行为中存在“赢家通吃”的动态。我们的结果表明:(1)攻击常常坍缩为子集模态主导,且(2)跨模态交互可忽略甚至为负,这与协同脆弱性的直觉相矛盾。这些发现揭示了当前评估中的一个关键盲点,表明高攻击成功率往往掩盖了对部分模态的根本性依赖。这为机制分析和未来防御开发奠定了理论基础。