Edit-DiffNeRF: Editing 3D Neural Radiance Fields using 2D Diffusion Model

Recent research has demonstrated that the combination of pretrained diffusion models with neural radiance fields (NeRFs) has emerged as a promising approach for text-to-3D generation. Simply coupling NeRF with diffusion models will result in cross-view inconsistency and degradation of stylized view syntheses. To address this challenge, we propose the Edit-DiffNeRF framework, which is composed of a frozen diffusion model, a proposed delta module to edit the latent semantic space of the diffusion model, and a NeRF. Instead of training the entire diffusion for each scene, our method focuses on editing the latent semantic space in frozen pretrained diffusion models by the delta module. This fundamental change to the standard diffusion framework enables us to make fine-grained modifications to the rendered views and effectively consolidate these instructions in a 3D scene via NeRF training. As a result, we are able to produce an edited 3D scene that faithfully aligns to input text instructions. Furthermore, to ensure semantic consistency across different viewpoints, we propose a novel multi-view semantic consistency loss that extracts a latent semantic embedding from the input view as a prior, and aim to reconstruct it in different views. Our proposed method has been shown to effectively edit real-world 3D scenes, resulting in 25% improvement in the alignment of the performed 3D edits with text instructions compared to prior work.

翻译：近期的研究表明，预训练扩散模型与神经辐射场（NeRF）的结合已成为文本到三维生成领域的一种有效方法。然而，简单地将NeRF与扩散模型耦合会导致跨视角不一致及风格化视图合成的退化。为解决这一挑战，我们提出Edit-DiffNeRF框架，该框架由冻结的扩散模型、用于编辑扩散模型潜在语义空间的增量模块（delta module）以及NeRF组成。我们的方法无需为每个场景训练整个扩散模型，而是通过增量模块聚焦于冻结预训练扩散模型中的潜在语义空间编辑。这一对标准扩散框架的根本性改变，使我们能够对渲染视图进行细粒度调整，并通过NeRF训练在三维场景中有效整合这些指令。最终，我们能够生成与输入文本指令精确对齐的编辑后三维场景。此外，为确保不同视角间的语义一致性，我们提出一种新颖的多视角语义一致性损失函数，该函数从输入视角提取潜在语义嵌入作为先验，旨在不同视角中重建该嵌入。实验表明，所提方法可有效编辑真实三维场景，相较于先前工作，三维编辑与文本指令的对齐度提升了25%。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日