BirdNeRF: Fast Neural Reconstruction of Large-Scale Scenes From Aerial Imagery

In this study, we introduce BirdNeRF, an adaptation of Neural Radiance Fields (NeRF) designed specifically for reconstructing large-scale scenes using aerial imagery. Unlike previous research focused on small-scale and object-centric NeRF reconstruction, our approach addresses multiple challenges, including (1) Addressing the issue of slow training and rendering associated with large models. (2) Meeting the computational demands necessitated by modeling a substantial number of images, requiring extensive resources such as high-performance GPUs. (3) Overcoming significant artifacts and low visual fidelity commonly observed in large-scale reconstruction tasks due to limited model capacity. Specifically, we present a novel bird-view pose-based spatial decomposition algorithm that decomposes a large aerial image set into multiple small sets with appropriately sized overlaps, allowing us to train individual NeRFs of sub-scene. This decomposition approach not only decouples rendering time from the scene size but also enables rendering to scale seamlessly to arbitrarily large environments. Moreover, it allows for per-block updates of the environment, enhancing the flexibility and adaptability of the reconstruction process. Additionally, we propose a projection-guided novel view re-rendering strategy, which aids in effectively utilizing the independently trained sub-scenes to generate superior rendering results. We evaluate our approach on existing datasets as well as against our own drone footage, improving reconstruction speed by 10x over classical photogrammetry software and 50x over state-of-the-art large-scale NeRF solution, on a single GPU with similar rendering quality.

翻译：摘要：本研究提出BirdNeRF，一种专为利用航空影像重建大规模场景而设计的神经辐射场（NeRF）改进方法。与以往聚焦于小规模及以物体为中心的NeRF重建研究不同，本方法应对多项挑战，包括：(1) 解决大模型训练与渲染速度缓慢的问题；(2) 满足建模大量图像所需的计算需求（如高性能GPU等大量资源）；(3) 克服大规模重建任务中因模型容量限制而常见的显著伪影与低视觉保真度问题。具体而言，我们提出一种新颖的基于鸟瞰视角位姿的空间分解算法，将大规模航空影像集分解为多个具有适当重叠的小型子集，从而实现对子场景独立训练NeRF。该分解方法不仅使渲染时间与场景规模解耦，还能实现任意大规模环境的无缝扩展渲染。此外，该方法支持环境的逐块更新，提升了重建过程的灵活性与适应性。同时，我们提出一种基于投影引导的新视角重渲染策略，有助于有效利用独立训练的子场景生成更优的渲染结果。在现有数据集及自采无人机航拍影像上的评估表明：在单GPU环境下，本方法相较经典摄影测量软件提升10倍重建速度，较当前最优的大规模NeRF方案提升50倍，且渲染质量相当。