As text-to-image diffusion models become increasingly deployed in real-world applications, concerns about backdoor attacks have gained significant attention. Prior work on text-based backdoor attacks has largely focused on diffusion models conditioned on a single lightweight text encoder. However, more recent diffusion models that incorporate multiple large-scale text encoders remain underexplored in this context. Given the substantially increased number of trainable parameters introduced by multiple text encoders, an important question is whether backdoor attacks can remain both efficient and effective in such settings. In this work, we study Stable Diffusion 3, which uses three distinct text encoders and has not yet been systematically analyzed for text-encoder-based backdoor vulnerabilities. To understand the role of text encoders in backdoor attacks, we define four categories of attack targets and identify the minimal sets of encoders required to achieve effective performance for each attack objective. Based on this, we further propose Multi-Encoder Lightweight aTtacks (MELT), which trains only low-rank adapters while keeping the pretrained text encoder weight frozen. We demonstrate that tuning fewer than 0.2% of the total encoder parameters is sufficient for successful backdoor attacks on Stable Diffusion 3, revealing previously underexplored vulnerabilities in practical attack scenarios in multi-encoder settings.
翻译:随着文本到图像扩散模型在现实应用中的部署日益广泛,关于后门攻击的担忧已引起显著关注。先前基于文本的后门攻击研究主要集中于使用单一轻量级文本编码器进行条件化的扩散模型。然而,近期采用多个大规模文本编码器的扩散模型在此背景下仍未得到充分探索。鉴于多个文本编码器引入了大量可训练参数,一个重要问题是后门攻击在此类设置中是否仍能保持高效与有效。在本工作中,我们研究了使用三种不同文本编码器的Stable Diffusion 3,该系统尚未针对基于文本编码器的后门漏洞进行系统分析。为理解文本编码器在后门攻击中的作用,我们定义了四类攻击目标,并识别了实现每种攻击目标所需的最小编码器集合。基于此,我们进一步提出了多编码器轻量级攻击(MELT),该方法仅训练低秩适配器,同时保持预训练文本编码器权重冻结。我们证明,仅微调总编码器参数中不到0.2%的部分,便足以在Stable Diffusion 3上实现成功的后门攻击,这揭示了多编码器设置中实际攻击场景下先前未被充分探索的脆弱性。