TerrainNet: Visual Modeling of Complex Terrain for High-speed, Off-road Navigation

Xiangyun Meng,Nathan Hatch,Alexander Lambert,Anqi Li,Nolan Wagener,Matthew Schmittle,JoonHo Lee,Wentao Yuan,Zoey Chen,Samuel Deng,Greg Okopal,Dieter Fox,Byron Boots,Amirreza Shaban

Effective use of camera-based vision systems is essential for robust performance in autonomous off-road driving, particularly in the high-speed regime. Despite success in structured, on-road settings, current end-to-end approaches for scene prediction have yet to be successfully adapted for complex outdoor terrain. To this end, we present TerrainNet, a vision-based terrain perception system for semantic and geometric terrain prediction for aggressive, off-road navigation. The approach relies on several key insights and practical considerations for achieving reliable terrain modeling. The network includes a multi-headed output representation to capture fine- and coarse-grained terrain features necessary for estimating traversability. Accurate depth estimation is achieved using self-supervised depth completion with multi-view RGB and stereo inputs. Requirements for real-time performance and fast inference speeds are met using efficient, learned image feature projections. Furthermore, the model is trained on a large-scale, real-world off-road dataset collected across a variety of diverse outdoor environments. We show how TerrainNet can also be used for costmap prediction and provide a detailed framework for integration into a planning module. We demonstrate the performance of TerrainNet through extensive comparison to current state-of-the-art baselines for camera-only scene prediction. Finally, we showcase the effectiveness of integrating TerrainNet within a complete autonomous-driving stack by conducting a real-world vehicle test in a challenging off-road scenario.

翻译：摄像头视觉系统的有效运用对于自动驾驶越野驾驶的稳健性能至关重要，尤其是在高速场景下。尽管在结构化的道路环境中取得了成功，但当前的端到端场景预测方法尚未能成功适配复杂的户外地形。为此，我们提出了TerrainNet——一种基于视觉的地形感知系统，用于激进的越野导航中的语义与几何地形预测。该方法依赖于若干关键洞察和实用考量以实现可靠的地形建模。网络包含多头输出表征，以捕捉评估可通过性所需的粗细粒度地形特征。通过利用多视角RGB与立体输入的自监督深度补全，实现了精确的深度估计。采用高效的学得图像特征投影，满足了实时性能与快速推理速度的要求。此外，模型还在大规模真实世界越野数据集上进行了训练，该数据集涵盖多种多样的户外环境。我们展示了TerrainNet如何用于代价地图预测，并提供了集成到规划模块的详细框架。通过与当前最先进的纯摄像头场景预测基准进行广泛对比，我们验证了TerrainNet的性能。最后，通过在具有挑战性的越野场景中进行真实车辆测试，我们展示了将TerrainNet集成到完整自动驾驶系统中的有效性。