Text-guided image editing aims to modify specific regions according to the target prompt while preserving the identity of the source image. Recent methods exploit explicit binary masks to constrain editing, but hard mask boundaries introduce artifacts and reduce editability. To address these issues, we propose FusionEdit, a training-free image editing framework that achieves precise and controllable edits. First, editing and preserved regions are automatically identified by measuring semantic discrepancies between source and target prompts. To mitigate boundary artifacts, FusionEdit performs distance-aware latent fusion along region boundaries to yield the soft and accurate mask, and employs a total variation loss to enforce smooth transitions, obtaining natural editing results. Second, FusionEdit leverages AdaIN-based modulation within DiT attention layers to perform a statistical attention fusion in the editing region, enhancing editability while preserving global consistency with the source image. Extensive experiments demonstrate that our FusionEdit significantly outperforms state-of-the-art methods. Code is available at \href{https://github.com/Yvan1001/FusionEdit}{https://github.com/Yvan1001/FusionEdit}.
翻译:文本引导的图像编辑旨在根据目标提示修改特定区域,同时保持源图像的身份特征。现有方法多利用显式二值掩码来约束编辑,但硬掩码边界会引入伪影并降低可编辑性。为解决这些问题,我们提出FusionEdit,一种免训练的图像编辑框架,能够实现精确且可控的编辑。首先,通过度量源提示与目标提示之间的语义差异,自动识别待编辑区域与待保留区域。为缓解边界伪影,FusionEdit沿区域边界执行距离感知的潜在融合以生成柔和精确的掩码,并采用全变分损失强制平滑过渡,从而获得自然的编辑结果。其次,FusionEdit在DiT注意力层中利用基于AdaIN的调制机制,在编辑区域执行统计注意力融合,在增强可编辑性的同时保持与源图像的全局一致性。大量实验表明,我们的FusionEdit方法显著优于当前最先进的方法。代码发布于\href{https://github.com/Yvan1001/FusionEdit}{https://github.com/Yvan1001/FusionEdit}。