The creation of 3D scenes has traditionally been both labor-intensive and costly, requiring designers to meticulously configure 3D assets and environments. Recent advancements in generative AI, including text-to-3D and image-to-3D methods, have dramatically reduced the complexity and cost of this process. However, current techniques for editing complex 3D scenes continue to rely on generally interactive multi-step, 2D-to-3D projection methods and diffusion-based techniques, which often lack precision in control and hamper real-time performance. In this work, we propose 3DSceneEditor, a fully 3D-based paradigm for real-time, precise editing of intricate 3D scenes using Gaussian Splatting. Unlike conventional methods, 3DSceneEditor operates through a streamlined 3D pipeline, enabling direct manipulation of Gaussians for efficient, high-quality edits based on input prompts.The proposed framework (i) integrates a pre-trained instance segmentation model for semantic labeling; (ii) employs a zero-shot grounding approach with CLIP to align target objects with user prompts; and (iii) applies scene modifications, such as object addition, repositioning, recoloring, replacing, and deletion directly on Gaussians. Extensive experimental results show that 3DSceneEditor achieves superior editing precision and speed with respect to current SOTA 3D scene editing approaches, establishing a new benchmark for efficient and interactive 3D scene customization.
翻译:三维场景的创建传统上既费时又昂贵,需要设计者精心配置三维资产与环境。近年来,生成式人工智能的进展,包括文本到三维和图像到三维方法,已显著降低了该过程的复杂性与成本。然而,当前用于编辑复杂三维场景的技术仍普遍依赖于交互式多步骤的二维到三维投影方法及基于扩散的技术,这些方法往往在控制精度上有所欠缺,并阻碍实时性能。在本工作中,我们提出了3DSceneEditor,一种完全基于三维的范式,利用高斯溅射对复杂三维场景进行实时、精确的编辑。与传统方法不同,3DSceneEditor通过简化的三维流程运行,能够直接操作高斯分布,从而基于输入提示实现高效、高质量的编辑。所提出的框架(i)集成了预训练的实例分割模型以进行语义标注;(ii)采用基于CLIP的零样本接地方法,将目标对象与用户提示对齐;(iii)直接在三维高斯表示上应用场景修改,如对象添加、重新定位、重新着色、替换和删除。大量实验结果表明,相较于当前最先进的三维场景编辑方法,3DSceneEditor在编辑精度和速度上均表现优异,为高效、交互式的三维场景定制设立了新基准。