In this paper, we target the adaptive source driven 3D scene editing task by proposing a CustomNeRF model that unifies a text description or a reference image as the editing prompt. However, obtaining desired editing results conformed with the editing prompt is nontrivial since there exist two significant challenges, including accurate editing of only foreground regions and multi-view consistency given a single-view reference image. To tackle the first challenge, we propose a Local-Global Iterative Editing (LGIE) training scheme that alternates between foreground region editing and full-image editing, aimed at foreground-only manipulation while preserving the background. For the second challenge, we also design a class-guided regularization that exploits class priors within the generation model to alleviate the inconsistency problem among different views in image-driven editing. Extensive experiments show that our CustomNeRF produces precise editing results under various real scenes for both text- and image-driven settings.
翻译:本文针对自适应源驱动的3D场景编辑任务,提出CustomNeRF模型,统一将文本描述或参考图像作为编辑提示。然而,要获得与编辑提示一致的理想编辑结果具有挑战性,这主要源于两大难题:仅对前景区域进行精确编辑,以及基于单视角参考图像实现多视角一致性。为应对第一个挑战,我们提出局部-全局迭代编辑(LGIE)训练方案,通过交替进行前景区域编辑与全图编辑,实现在保留背景的前提下仅对前景区域进行操控。针对第二个难题,我们还设计了一种类别引导正则化方法,利用生成模型中的类别先验知识,缓解图像驱动编辑中不同视角间的非一致性问题。大量实验表明,在文本驱动和图像驱动两种设置下,我们的CustomNeRF均能在各类真实场景中产生精确的编辑结果。