Diffusion Handles is a novel approach to enabling 3D object edits on diffusion images. We accomplish these edits using existing pre-trained diffusion models, and 2D image depth estimation, without any fine-tuning or 3D object retrieval. The edited results remain plausible, photo-real, and preserve object identity. Diffusion Handles address a critically missing facet of generative image based creative design, and significantly advance the state-of-the-art in generative image editing. Our key insight is to lift diffusion activations for an object to 3D using a proxy depth, 3D-transform the depth and associated activations, and project them back to image space. The diffusion process applied to the manipulated activations with identity control, produces plausible edited images showing complex 3D occlusion and lighting effects. We evaluate Diffusion Handles: quantitatively, on a large synthetic data benchmark; and qualitatively by a user study, showing our output to be more plausible, and better than prior art at both, 3D editing and identity control.
翻译:扩散手柄是一种新颖方法,能够在扩散生成图像上实现三维物体编辑。我们仅利用现有预训练扩散模型与二维图像深度估计完成这些编辑,无需任何微调或三维物体检索。编辑结果保持合理性、真实感,并保留物体身份特征。扩散手柄解决了基于生成图像的创意设计中关键缺失的环节,显著推进了生成式图像编辑技术的前沿水平。其核心洞察在于:通过代理深度将物体对应的扩散激活特征提升至三维,对深度及其关联激活特征进行三维变换,再将其投影回图像空间。结合身份控制的扩散过程作用于经操作的激活特征,即可生成呈现复杂三维遮挡与光照效果的合理编辑图像。我们通过大规模合成数据基准进行定量评估,并通过用户研究进行定性分析,结果表明扩散手柄在三维编辑与身份控制两方面均优于现有技术,输出结果具有更高的合理性与更优表现。