Seamlessly moving objects within a scene is a common requirement for image editing, but it is still a challenge for existing editing methods. Especially for real-world images, the occlusion situation further increases the difficulty. The main difficulty is that the occluded portion needs to be completed before movement can proceed. To leverage the real-world knowledge embedded in the pre-trained diffusion models, we propose a Diffusion-based framework specifically designed for Occluded Object Movement, named DiffOOM. The proposed DiffOOM consists of two parallel branches that perform object de-occlusion and movement simultaneously. The de-occlusion branch utilizes a background color-fill strategy and a continuously updated object mask to focus the diffusion process on completing the obscured portion of the target object. Concurrently, the movement branch employs latent optimization to place the completed object in the target location and adopts local text-conditioned guidance to integrate the object into new surroundings appropriately. Extensive evaluations demonstrate the superior performance of our method, which is further validated by a comprehensive user study.
翻译:在场景中无缝移动物体是图像编辑的常见需求,但对现有编辑方法仍具挑战性。特别是在真实世界图像中,遮挡情况进一步增加了操作难度。核心困难在于移动前需先补全被遮挡部分。为利用预训练扩散模型中嵌入的真实世界知识,我们提出一种专门针对遮挡物体移动的基于扩散的框架,命名为DiffOOM。该框架包含两个并行分支,可同步执行物体去遮挡与移动操作。去遮挡分支采用背景颜色填充策略与持续更新的物体掩码,使扩散过程专注于补全目标物体的被遮蔽部分;与此同时,移动分支通过潜在空间优化将补全后的物体定位至目标位置,并采用局部文本条件引导使物体恰当地融入新环境。大量评估实验证明了本方法的优越性能,综合用户研究进一步验证了其有效性。