Editing a local region or a specific object in a 3D scene represented by a NeRF is challenging, mainly due to the implicit nature of the scene representation. Consistently blending a new realistic object into the scene adds an additional level of difficulty. We present Blended-NeRF, a robust and flexible framework for editing a specific region of interest in an existing NeRF scene, based on text prompts or image patches, along with a 3D ROI box. Our method leverages a pretrained language-image model to steer the synthesis towards a user-provided text prompt or image patch, along with a 3D MLP model initialized on an existing NeRF scene to generate the object and blend it into a specified region in the original scene. We allow local editing by localizing a 3D ROI box in the input scene, and seamlessly blend the content synthesized inside the ROI with the existing scene using a novel volumetric blending technique. To obtain natural looking and view-consistent results, we leverage existing and new geometric priors and 3D augmentations for improving the visual fidelity of the final result. We test our framework both qualitatively and quantitatively on a variety of real 3D scenes and text prompts, demonstrating realistic multi-view consistent results with much flexibility and diversity compared to the baselines. Finally, we show the applicability of our framework for several 3D editing applications, including adding new objects to a scene, removing/replacing/altering existing objects, and texture conversion.
翻译:在由NeRF表示的3D场景中编辑局部区域或特定对象极具挑战性,这主要源于场景表示的隐式特性。将新生成的逼真对象无缝融入场景更增加了难度。我们提出Blended-NeRF——一个鲁棒且灵活的框架,能够基于文本提示或图像补丁以及3D感兴趣区域(ROI)框,对现有NeRF场景中特定区域进行编辑。该方法利用预训练的图文模型引导合成过程朝向用户提供的文本提示或图像补丁,同时结合基于现有NeRF场景初始化的3D MLP模型,生成对象并将其融入原始场景的指定区域。我们通过在输入场景中定位3D ROI框实现局部编辑,并利用新型体素混合技术将ROI内部合成的内容与现有场景无缝融合。为获得自然且视角一致的结果,我们采用现有及新增的几何先验与3D增强技术来提升最终结果的视觉保真度。我们在多种真实3D场景和文本提示上对框架进行了定性与定量测试,结果表明与基线方法相比,本方法能够生成具有灵活性与多样性的逼真多视角一致结果。最后,我们展示了该框架在多种3D编辑应用中的适用性,包括向场景中添加新对象、移除/替换/修改现有对象以及纹理转换。