Background consistency remains a significant challenge in image editing tasks. Despite extensive developments, existing works still face a trade-off between maintaining similarity to the original image and generating content that aligns with the target. Here, we propose KV-Edit, a training-free approach that uses KV cache in DiTs to maintain background consistency, where background tokens are preserved rather than regenerated, eliminating the need for complex mechanisms or expensive training, ultimately generating new content that seamlessly integrates with the background within user-provided regions. We further explore the memory consumption of the KV cache during editing and optimize the space complexity to $O(1)$ using an inversion-free method. Our approach is compatible with any DiT-based generative model without additional training. Experiments demonstrate that KV-Edit significantly outperforms existing approaches in terms of both background and image quality, even surpassing training-based methods. Project webpage is available at https://xilluill.github.io/projectpages/KV-Edit
翻译:在图像编辑任务中,背景一致性仍然是一个重大挑战。尽管已有广泛发展,现有方法仍面临保持与原图相似性和生成符合目标内容之间的权衡。本文提出KV-Edit,一种基于DiTs中KV缓存机制的无训练方法,通过保留而非重新生成背景标记来维持背景一致性,无需复杂机制或昂贵训练,最终在用户指定区域内生成与背景无缝融合的新内容。我们进一步探究了编辑过程中KV缓存的内存消耗,并采用无反转方法将空间复杂度优化至$O(1)$。本方法兼容任何基于DiT的生成模型,无需额外训练。实验表明,KV-Edit在背景和图像质量方面均显著优于现有方法,甚至超越了基于训练的方法。项目网页地址:https://xilluill.github.io/projectpages/KV-Edit