Global-guided Focal Neural Radiance Field for Large-scale Scene Rendering

Neural radiance fields~(NeRF) have recently been applied to render large-scale scenes. However, their limited model capacity typically results in blurred rendering results. Existing large-scale NeRFs primarily address this limitation by partitioning the scene into blocks, which are subsequently handled by separate sub-NeRFs. These sub-NeRFs, trained from scratch and processed independently, lead to inconsistencies in geometry and appearance across the scene. Consequently, the rendering quality fails to exhibit significant improvement despite the expansion of model capacity. In this work, we present global-guided focal neural radiance field (GF-NeRF) that achieves high-fidelity rendering of large-scale scenes. Our proposed GF-NeRF utilizes a two-stage (Global and Focal) architecture and a global-guided training strategy. The global stage obtains a continuous representation of the entire scene while the focal stage decomposes the scene into multiple blocks and further processes them with distinct sub-encoders. Leveraging this two-stage architecture, sub-encoders only need fine-tuning based on the global encoder, thus reducing training complexity in the focal stage while maintaining scene-wide consistency. Spatial information and error information from the global stage also benefit the sub-encoders to focus on crucial areas and effectively capture more details of large-scale scenes. Notably, our approach does not rely on any prior knowledge about the target scene, attributing GF-NeRF adaptable to various large-scale scene types, including street-view and aerial-view scenes. We demonstrate that our method achieves high-fidelity, natural rendering results on various types of large-scale datasets. Our project page: https://shaomq2187.github.io/GF-NeRF/

翻译：神经辐射场（NeRF）最近被应用于大规模场景渲染，但其有限模型容量通常导致渲染结果模糊。现有大规模NeRF主要通过将场景划分为多个区块，并由独立的子NeRF分别处理来应对这一局限。这些子NeRF从头开始训练且独立处理，导致场景几何与外观不一致。因此，尽管模型容量扩大，渲染质量却未显著提升。本文提出全局引导的焦点神经辐射场（GF-NeRF），实现了大规模场景的高保真渲染。GF-NeRF采用两阶段（全局与焦点）架构及全局引导训练策略：全局阶段获取整个场景的连续表示，焦点阶段将场景分解为多个区块，并由独立子编码器进一步处理。凭借这一两阶段架构，子编码器仅需基于全局编码器进行微调，从而降低焦点阶段训练复杂度并保持场景全局一致性。全局阶段的空间信息与误差信息还有助于子编码器聚焦关键区域，有效捕获大规模场景更多细节。值得注意的是，本方法不依赖目标场景的任何先验知识，使得GF-NeRF可适配街景、航拍等各类大规模场景。我们证明该方法在多种大规模数据集上均能生成高保真、自然的渲染结果。项目页面：https://shaomq2187.github.io/GF-NeRF/