NeRF-VPT: Learning Novel View Representations with Neural Radiance Fields via View Prompt Tuning

Neural Radiance Fields (NeRF) have garnered remarkable success in novel view synthesis. Nonetheless, the task of generating high-quality images for novel views persists as a critical challenge. While the existing efforts have exhibited commendable progress, capturing intricate details, enhancing textures, and achieving superior Peak Signal-to-Noise Ratio (PSNR) metrics warrant further focused attention and advancement. In this work, we propose NeRF-VPT, an innovative method for novel view synthesis to address these challenges. Our proposed NeRF-VPT employs a cascading view prompt tuning paradigm, wherein RGB information gained from preceding rendering outcomes serves as instructive visual prompts for subsequent rendering stages, with the aspiration that the prior knowledge embedded in the prompts can facilitate the gradual enhancement of rendered image quality. NeRF-VPT only requires sampling RGB data from previous stage renderings as priors at each training stage, without relying on extra guidance or complex techniques. Thus, our NeRF-VPT is plug-and-play and can be readily integrated into existing methods. By conducting comparative analyses of our NeRF-VPT against several NeRF-based approaches on demanding real-scene benchmarks, such as Realistic Synthetic 360, Real Forward-Facing, Replica dataset, and a user-captured dataset, we substantiate that our NeRF-VPT significantly elevates baseline performance and proficiently generates more high-quality novel view images than all the compared state-of-the-art methods. Furthermore, the cascading learning of NeRF-VPT introduces adaptability to scenarios with sparse inputs, resulting in a significant enhancement of accuracy for sparse-view novel view synthesis. The source code and dataset are available at \url{https://github.com/Freedomcls/NeRF-VPT}.

翻译：神经辐射场（NeRF）在新视角合成任务中取得了显著成功。然而，生成高质量新视角图像仍然是一个关键挑战。尽管现有研究已取得可喜进展，但在捕捉复杂细节、增强纹理以及实现更优峰值信噪比（PSNR）指标方面，仍需进一步聚焦与推进。本文提出NeRF-VPT——一种创新的新视角合成方法以应对上述挑战。NeRF-VPT采用级联视角提示调优范式，将前序渲染结果中获取的RGB信息作为后续渲染阶段的可视化指导提示，期望提示中蕴含的先验知识能够逐步提升渲染图像质量。NeRF-VPT在每个训练阶段仅需从先前阶段渲染结果中采样RGB数据作为先验，无需依赖额外指导或复杂技术。因此，NeRF-VPT具有即插即用特性，可轻松集成至现有方法。通过在苛刻的真实场景基准（如Realistic Synthetic 360、Real Forward-Facing、Replica数据集及用户采集数据集）上与多种基于NeRF的方法进行对比分析，我们证实NeRF-VPT显著提升了基线性能，并相比所有最新对比方法能更高效地生成高质量新视角图像。此外，NeRF-VPT的级联学习机制使其具备对稀疏输入场景的适应性，从而显著提升了稀疏视角新视角合成的精度。源代码与数据集已公开于\url{https://github.com/Freedomcls/NeRF-VPT}。