In this paper, we focus on the One-shot Novel View Synthesis (O-NVS) task which targets synthesizing photo-realistic novel views given only one reference image per scene. Previous One-shot Generalizable Neural Radiance Fields (OG-NeRF) methods solve this task in an inference-time finetuning-free manner, yet suffer the blurry issue due to the encoder-only architecture that highly relies on the limited reference image. On the other hand, recent diffusion-based image-to-3d methods show vivid plausible results via distilling pre-trained 2D diffusion models into a 3D representation, yet require tedious per-scene optimization. Targeting these issues, we propose the GD$^2$-NeRF, a Generative Detail compensation framework via GAN and Diffusion that is both inference-time finetuning-free and with vivid plausible details. In detail, following a coarse-to-fine strategy, GD$^2$-NeRF is mainly composed of a One-stage Parallel Pipeline (OPP) and a 3D-consistent Detail Enhancer (Diff3DE). At the coarse stage, OPP first efficiently inserts the GAN model into the existing OG-NeRF pipeline for primarily relieving the blurry issue with in-distribution priors captured from the training dataset, achieving a good balance between sharpness (LPIPS, FID) and fidelity (PSNR, SSIM). Then, at the fine stage, Diff3DE further leverages the pre-trained image diffusion models to complement rich out-distribution details while maintaining decent 3D consistency. Extensive experiments on both the synthetic and real-world datasets show that GD$^2$-NeRF noticeably improves the details while without per-scene finetuning.
翻译:本文聚焦于单样本新视角合成(O-NVS)任务,其目标是在每场景仅有一张参考图像的情况下合成逼真的新视角。先前基于单样本可泛化神经辐射场(OG-NeRF)的方法以推理时免微调方式解决该任务,但由于采用仅依赖有限参考图像的编码器架构,存在模糊问题。另一方面,近期基于扩散模型的图像转3D方法通过将预训练的2D扩散模型蒸馏到3D表征中展现出逼真效果,但需对每个场景进行繁琐的优化。针对这些问题,我们提出GD²-NeRF——一种基于GAN与扩散的生成式细节补偿框架,既实现推理时免微调,又具备逼真细节。具体而言,遵循由粗到精的策略,GD²-NeRF主要由单阶段并行管道(OPP)和3D一致性细节增强器(Diff3DE)组成。在粗阶段,OPP首先将GAN模型高效嵌入现有OG-NeRF管道,利用从训练数据集中捕获的分布内先验初步缓解模糊问题,在清晰度(LPIPS、FID)与保真度(PSNR、SSIM)间取得良好平衡。随后在精阶段,Diff3DE进一步利用预训练图像扩散模型补充丰富的分布外细节,同时保持合理的3D一致性。在合成数据集与真实世界数据集上的大量实验表明,GD²-NeRF无需逐场景微调即可显著提升细节质量。