We present GenN2N, a unified NeRF-to-NeRF translation framework for various NeRF translation tasks such as text-driven NeRF editing, colorization, super-resolution, inpainting, etc. Unlike previous methods designed for individual translation tasks with task-specific schemes, GenN2N achieves all these NeRF editing tasks by employing a plug-and-play image-to-image translator to perform editing in the 2D domain and lifting 2D edits into the 3D NeRF space. Since the 3D consistency of 2D edits may not be assured, we propose to model the distribution of the underlying 3D edits through a generative model that can cover all possible edited NeRFs. To model the distribution of 3D edited NeRFs from 2D edited images, we carefully design a VAE-GAN that encodes images while decoding NeRFs. The latent space is trained to align with a Gaussian distribution and the NeRFs are supervised through an adversarial loss on its renderings. To ensure the latent code does not depend on 2D viewpoints but truly reflects the 3D edits, we also regularize the latent code through a contrastive learning scheme. Extensive experiments on various editing tasks show GenN2N, as a universal framework, performs as well or better than task-specific specialists while possessing flexible generative power. More results on our project page: https://xiangyueliu.github.io/GenN2N/
翻译:我们提出GenN2N,这是一个统一的NeRF到NeRF翻译框架,适用于多种NeRF翻译任务,如文本驱动的NeRF编辑、着色、超分辨率、修复等。与以往为特定翻译任务设计专用方案的方法不同,GenN2N通过采用即插即用的图像到图像翻译器在二维域内执行编辑,并将二维编辑提升到三维NeRF空间,从而完成所有上述NeRF编辑任务。由于二维编辑的三维一致性可能无法保证,我们提出通过生成模型对底层三维编辑的分布进行建模,该模型能覆盖所有可能的编辑后NeRF。为了从二维编辑图像中建模三维编辑后NeRF的分布,我们精心设计了一个VAE-GAN,其对图像进行编码同时解码NeRF。隐空间被训练为与高斯分布对齐,而NeRF则通过其渲染图上的对抗损失进行监督。为确保隐编码不依赖于二维视角,而是真正反映三维编辑,我们通过对比学习方案对隐编码进行正则化。在多种编辑任务上的广泛实验表明,GenN2N作为一个通用框架,性能可与任务专用专家相媲美或更优,同时具备灵活的生成能力。更多结果请参见项目页面:https://xiangyueliu.github.io/GenN2N/