In this work, we consider the problem of learning end to end perception to control for ground vehicles solely from aerial imagery. Photogrammetric simulators allow the synthesis of novel views through the transformation of pre-generated assets into novel views.However, they have a large setup cost, require careful collection of data and often human effort to create usable simulators. We use a Neural Radiance Field (NeRF) as an intermediate representation to synthesize novel views from the point of view of a ground vehicle. These novel viewpoints can then be used for several downstream autonomous navigation applications. In this work, we demonstrate the utility of novel view synthesis though the application of training a policy for end to end learning from images and depth data. In a traditional real to sim to real framework, the collected data would be transformed into a visual simulator which could then be used to generate novel views. In contrast, using a NeRF allows a compact representation and the ability to optimize over the parameters of the visual simulator as more data is gathered in the environment. We demonstrate the efficacy of our method in a custom built mini-city environment through the deployment of imitation policies on robotic cars. We additionally consider the task of place localization and demonstrate that our method is able to relocalize the car in the real world.
翻译:在本研究中,我们探讨了仅通过航空影像实现地面车辆端到端感知与控制的学习问题。摄影测量模拟器能够通过将预生成素材转换为新视角来合成新颖视图,但其搭建成本高昂,需要精细的数据采集过程,且通常需耗费人力才能创建可用的模拟器。我们采用神经辐射场(NeRF)作为中间表示方法,从地面车辆的视角合成新颖视图。这些生成的新视角可应用于多种下游自主导航任务。本研究通过训练端到端图像与深度数据学习策略的应用,展示了新颖视图合成技术的实用性。在传统的"真实-模拟-真实"框架中,采集的数据需先转换为视觉模拟器才能生成新视角;相比之下,NeRF技术不仅能实现紧凑的场景表示,还能随着环境数据的持续收集对视觉模拟器的参数进行优化。我们在定制化微型城市环境中,通过将模仿策略部署于机器人车辆,验证了本方法的有效性。此外,我们还考察了位置重定位任务,实验证明本方法能够实现车辆在真实世界中的精准重定位。