We propose a generative technique to edit 3D shapes, represented as meshes, NeRFs, or Gaussian Splats, in approximately 3 seconds, without the need for running an SDS type of optimization. Our key insight is to cast 3D editing as a multiview image inpainting problem, as this representation is generic and can be mapped back to any 3D representation using the bank of available Large Reconstruction Models. We explore different fine-tuning strategies to obtain both multiview generation and inpainting capabilities within the same diffusion model. In particular, the design of the inpainting mask is an important factor of training an inpainting model, and we propose several masking strategies to mimic the types of edits a user would perform on a 3D shape. Our approach takes 3D generative editing from hours to seconds and produces higher-quality results compared to previous works.
翻译:我们提出一种生成式技术,用于在约3秒内编辑以网格、NeRF或高斯泼溅表示的三维形状,无需运行SDS类优化过程。我们的核心见解是将三维编辑视为多视角图像修复问题,因为该表示具有通用性,且可利用现有大型重建模型库映射回任意三维表示。我们探索了不同的微调策略,以在同一扩散模型中同时实现多视角生成与修复能力。特别地,修复掩码的设计是训练修复模型的关键因素,我们提出了多种掩码策略来模拟用户对三维形状可能执行的编辑类型。相较于已有工作,我们的方法将三维生成式编辑从数小时缩短至数秒,并产生更高质量的结果。