Diffusion models have revolutionized image editing but often generate images that violate physical laws, particularly the effects of objects on the scene, e.g., occlusions, shadows, and reflections. By analyzing the limitations of self-supervised approaches, we propose a practical solution centered on a \q{counterfactual} dataset. Our method involves capturing a scene before and after removing a single object, while minimizing other changes. By fine-tuning a diffusion model on this dataset, we are able to not only remove objects but also their effects on the scene. However, we find that applying this approach for photorealistic object insertion requires an impractically large dataset. To tackle this challenge, we propose bootstrap supervision; leveraging our object removal model trained on a small counterfactual dataset, we synthetically expand this dataset considerably. Our approach significantly outperforms prior methods in photorealistic object removal and insertion, particularly at modeling the effects of objects on the scene.
翻译:扩散模型革新了图像编辑技术,但生成的图像常违背物理规律,特别是物体对场景的影响(如遮挡、阴影和反射)。通过分析自监督方法的局限性,我们提出了一种以"反事实"数据集为核心的实用解决方案。该方法通过最小化其他变化的前提下,记录移除单个物体前、后的场景状态。在此数据集上微调扩散模型后,我们不仅能移除物体本身,还能消除其对场景的物理影响。然而,我们发现将这一方法应用于光照真实感物体插入时,需要规模过大的数据集而难以实用。为解决这一挑战,我们提出自举监督策略:利用基于小规模反事实数据集训练的物体移除模型,显著合成扩展该数据集。我们的方法在光照真实感物体移除与插入任务中显著优于现有方法,尤其在建模物体对场景影响方面表现突出。