Diffusion models have become a cornerstone in image editing, offering flexibility with language prompts and source images. However, a key challenge is attribute leakage, where unintended modifications occur in non-target regions or within target regions due to attribute interference. Existing methods often suffer from leakage due to naive text embeddings and inadequate handling of End-of-Sequence (EOS) token embeddings. To address this, we propose ALE-Edit (Attribute-leakage-free editing), a novel framework to minimize attribute leakage with three components: (1) Object-Restricted Embeddings (ORE) to localize object-specific attributes in text embeddings, (2) Region-Guided Blending for Cross-Attention Masking (RGB-CAM) to align attention with target regions, and (3) Background Blending (BB) to preserve non-edited regions. Additionally, we introduce ALE-Bench, a benchmark for evaluating attribute leakage with new metrics for target-external and target-internal leakage. Experiments demonstrate that our framework significantly reduces attribute leakage while maintaining high editing quality, providing an efficient and tuning-free solution for multi-object image editing.
翻译:扩散模型已成为图像编辑的基石,通过语言提示和源图像提供了灵活性。然而,一个关键挑战是属性泄漏,即由于属性干扰,在非目标区域或目标区域内发生意外的修改。现有方法常因朴素的文本嵌入以及对序列结束(EOS)令牌嵌入处理不足而遭受泄漏问题。为解决此问题,我们提出了ALE-Edit(无属性泄漏编辑),这是一个新颖的框架,通过三个组件来最小化属性泄漏:(1)对象限制嵌入(ORE),用于在文本嵌入中定位对象特定属性;(2)用于交叉注意力掩码的区域引导混合(RGB-CAM),以使注意力与目标区域对齐;以及(3)背景混合(BB),用于保留未编辑区域。此外,我们引入了ALE-Bench,这是一个用于评估属性泄漏的基准测试,包含针对目标外部和目标内部泄漏的新度量标准。实验表明,我们的框架在保持高编辑质量的同时,显著减少了属性泄漏,为多对象图像编辑提供了一个高效且无需调优的解决方案。