Generating novel views from recorded videos is crucial for enabling autonomous UAV navigation. Recent advancements in neural rendering have facilitated the rapid development of methods capable of rendering new trajectories. However, these methods often fail to generalize well to regions far from the training data without an optimized flight path, leading to suboptimal reconstructions. We propose a self-supervised cyclic neural-analytic pipeline that combines high-quality neural rendering outputs with precise geometric insights from analytical methods. Our solution improves RGB and mesh reconstructions for novel view synthesis, especially in undersampled areas and regions that are completely different from the training dataset. We use an effective transformer-based architecture for image reconstruction to refine and adapt the synthesis process, enabling effective handling of novel, unseen poses without relying on extensive labeled datasets. Our findings demonstrate substantial improvements in rendering views of novel and also 3D reconstruction, which to the best of our knowledge is a first, setting a new standard for autonomous navigation in complex outdoor environments.
翻译:从录制的视频中生成新视角对于实现自主无人机导航至关重要。神经渲染的最新进展促进了能够渲染新轨迹的方法的快速发展。然而,这些方法通常难以良好地泛化到远离训练数据的区域(在没有优化飞行路径的情况下),导致次优的重建结果。我们提出了一种自监督循环神经解析流程,它将高质量的神经渲染输出与分析方法的精确几何洞察相结合。我们的解决方案改进了用于新视角合成的RGB和网格重建,特别是在采样不足区域以及与训练数据集完全不同的区域。我们采用一种有效的基于Transformer的图像重建架构来优化和调整合成过程,使其能够有效处理新颖、未见过的位姿,而无需依赖大量标注数据集。我们的研究结果表明,在新视角渲染以及三维重建方面均取得了显著改进,据我们所知,这是首次实现此类综合提升,为复杂户外环境中的自主导航设立了新标准。