GD^2-NeRF: Generative Detail Compensation via GAN and Diffusion for One-shot Generalizable Neural Radiance Fields

In this paper, we focus on the One-shot Novel View Synthesis (O-NVS) task which targets synthesizing photo-realistic novel views given only one reference image per scene. Previous One-shot Generalizable Neural Radiance Fields (OG-NeRF) methods solve this task in an inference-time finetuning-free manner, yet suffer the blurry issue due to the encoder-only architecture that highly relies on the limited reference image. On the other hand, recent diffusion-based image-to-3d methods show vivid plausible results via distilling pre-trained 2D diffusion models into a 3D representation, yet require tedious per-scene optimization. Targeting these issues, we propose the GD$^2$-NeRF, a Generative Detail compensation framework via GAN and Diffusion that is both inference-time finetuning-free and with vivid plausible details. In detail, following a coarse-to-fine strategy, GD$^2$-NeRF is mainly composed of a One-stage Parallel Pipeline (OPP) and a 3D-consistent Detail Enhancer (Diff3DE). At the coarse stage, OPP first efficiently inserts the GAN model into the existing OG-NeRF pipeline for primarily relieving the blurry issue with in-distribution priors captured from the training dataset, achieving a good balance between sharpness (LPIPS, FID) and fidelity (PSNR, SSIM). Then, at the fine stage, Diff3DE further leverages the pre-trained image diffusion models to complement rich out-distribution details while maintaining decent 3D consistency. Extensive experiments on both the synthetic and real-world datasets show that GD$^2$-NeRF noticeably improves the details while without per-scene finetuning.

翻译：本文聚焦于单样本新视角合成（O-NVS）任务，其目标是在每场景仅有一张参考图像的情况下合成逼真的新视角。先前基于单样本可泛化神经辐射场（OG-NeRF）的方法以推理时免微调方式解决该任务，但由于采用仅依赖有限参考图像的编码器架构，存在模糊问题。另一方面，近期基于扩散模型的图像转3D方法通过将预训练的2D扩散模型蒸馏到3D表征中展现出逼真效果，但需对每个场景进行繁琐的优化。针对这些问题，我们提出GD²-NeRF——一种基于GAN与扩散的生成式细节补偿框架，既实现推理时免微调，又具备逼真细节。具体而言，遵循由粗到精的策略，GD²-NeRF主要由单阶段并行管道（OPP）和3D一致性细节增强器（Diff3DE）组成。在粗阶段，OPP首先将GAN模型高效嵌入现有OG-NeRF管道，利用从训练数据集中捕获的分布内先验初步缓解模糊问题，在清晰度（LPIPS、FID）与保真度（PSNR、SSIM）间取得良好平衡。随后在精阶段，Diff3DE进一步利用预训练图像扩散模型补充丰富的分布外细节，同时保持合理的3D一致性。在合成数据集与真实世界数据集上的大量实验表明，GD²-NeRF无需逐场景微调即可显著提升细节质量。