Diffusion models have transformed image generation, yet controlling their outputs to reliably erase undesired concepts remains challenging. Existing approaches usually require task-specific training and struggle to generalize across both concrete (e.g., objects) and abstract (e.g., styles) concepts. We propose CASteer (Cross-Attention Steering), a training-free framework for concept erasure in diffusion models using steering vectors to influence hidden representations dynamically. CASteer precomputes concept-specific steering vectors by averaging neural activations from images generated for each target concept. During inference, it dynamically applies these vectors to suppress undesired concepts only when they appear, ensuring that unrelated regions remain unaffected. This selective activation enables precise, context-aware erasure without degrading overall image quality. This approach achieves effective removal of harmful or unwanted content across a wide range of visual concepts, all without model retraining. CASteer outperforms state-of-the-art concept erasure techniques while preserving unrelated content and minimizing unintended effects.
翻译:扩散模型已彻底改变了图像生成领域,然而控制其输出以可靠地擦除不期望的概念仍然具有挑战性。现有方法通常需要针对特定任务进行训练,并且难以在具体概念(例如物体)和抽象概念(例如风格)上实现泛化。我们提出了CASteer(交叉注意力引导),一种无需训练的扩散模型概念擦除框架,它利用引导向量动态影响隐藏表示。CASteer通过平均针对每个目标概念生成的图像所产生的神经激活,预先计算特定概念的引导向量。在推理过程中,该框架动态应用这些向量,仅在不需要的概念出现时对其进行抑制,从而确保无关区域不受影响。这种选择性激活机制实现了精确的、上下文感知的概念擦除,且不会降低整体图像质量。该方法能够在无需重新训练模型的情况下,有效移除广泛视觉概念中的有害或不期望内容。CASteer在保持无关内容完整性和最小化意外影响的同时,其性能超越了当前最先进的概念擦除技术。