Towards a Training Free Approach for 3D Scene Editing

Text driven diffusion models have shown remarkable capabilities in editing images. However, when editing 3D scenes, existing works mostly rely on training a NeRF for 3D editing. Recent NeRF editing methods leverages edit operations by deploying 2D diffusion models and project these edits into 3D space. They require strong positional priors alongside text prompt to identify the edit location. These methods are operational on small 3D scenes and are more generalized to particular scene. They require training for each specific edit and cannot be exploited in real-time edits. To address these limitations, we propose a novel method, FreeEdit, to make edits in training free manner using mesh representations as a substitute for NeRF. Training-free methods are now a possibility because of the advances in foundation model's space. We leverage these models to bring a training-free alternative and introduce solutions for insertion, replacement and deletion. We consider insertion, replacement and deletion as basic blocks for performing intricate edits with certain combinations of these operations. Given a text prompt and a 3D scene, our model is capable of identifying what object should be inserted/replaced or deleted and location where edit should be performed. We also introduce a novel algorithm as part of FreeEdit to find the optimal location on grounding object for placement. We evaluate our model by comparing it with baseline models on a wide range of scenes using quantitative and qualitative metrics and showcase the merits of our method with respect to others.

翻译：文本驱动的扩散模型在图像编辑方面已展现出卓越能力。然而，在编辑三维场景时，现有研究大多依赖训练神经辐射场（NeRF）进行三维编辑。近期的NeRF编辑方法通过部署二维扩散模型实施编辑操作，并将这些编辑投影至三维空间。这些方法需要结合文本提示与强位置先验来识别编辑位置，适用于小型三维场景且更偏向特定场景的泛化。它们需要对每次具体编辑进行训练，无法实现实时编辑。为克服这些局限，我们提出一种名为FreeEdit的新方法，以网格表示替代NeRF，实现无需训练的编辑。得益于基础模型空间的进展，无需训练的方法已成为可能。我们利用这些模型提出无需训练的替代方案，并针对插入、替换和删除操作提出解决方案。我们将插入、替换与删除视为执行复杂编辑的基本构建模块，通过特定操作组合实现精细编辑。给定文本提示与三维场景，我们的模型能够识别应插入/替换或删除的对象及其编辑位置。作为FreeEdit的组成部分，我们还提出一种新颖算法，用于在接地对象上寻找最优放置位置。我们通过定量与定性指标，在广泛场景中与基线模型进行比较来评估本模型，并展示本方法相较于其他方法的优势。