Despite the remarkable progress of diffusion models in image generation, recent studies reveal their vulnerability to backdoor attacks via covert visual or textual triggers. Although evolving defense mechanisms can detect most existing threats through visual inspection or feature analysis, we introduce BadBlocks-a novel, lightweight, and highly covert attack that challenges these safeguards. By selectively poisoning specific blocks within the UNet architecture while keeping other components intact, BadBlocks requires only 30% of the computational resources and 20% of the GPU time of conventional attacks, effectively democratizing backdoor injection on consumer-grade GPUs. Empirical evaluations demonstrate that BadBlocks achieves a high attack success rate with negligible perceptual quality loss, while successfully bypassing state-of-the-art defenses, particularly attention-based detection frameworks. Layer-level ablation studies further confirm that backdoor mapping does not require full-network fine-tuning, revealing the disparate vulnerability of different neural layers. Overall, BadBlocks significantly lowers the barrier for executing backdoor attacks, presenting a critical security risk. Our code is available at: https://github.com/paoche11/BadBlocks.
翻译:尽管扩散模型在图像生成领域取得了显著进展,但近期研究表明,其易受到通过隐蔽视觉或文本触发器的后门攻击。虽然不断演进的防御机制可通过视觉检测或特征分析识别大多数现有威胁,我们提出了BadBlocks——一种新颖、轻量级且高度隐蔽的攻击方法,用以挑战这些防护措施。通过选择性污染UNet架构中的特定模块,同时保持其他组件完好无损,BadBlocks仅需传统攻击30%的计算资源和20%的GPU时间,有效实现了在消费级GPU上进行后门注入的普及化。实验评估表明,BadBlocks在攻击成功率高且感知质量损失可忽略不计的同时,成功绕过了最先进的防御技术,尤其是基于注意力的检测框架。逐层消融研究进一步证实,后门映射无需全网络微调,揭示出不同神经层在脆弱性上的差异性。总体而言,BadBlocks显著降低了执行后门攻击的门槛,构成了重大的安全风险。我们的代码位于:https://github.com/paoche11/BadBlocks。