TerrainNet: Visual Modeling of Complex Terrain for High-speed, Off-road Navigation

Xiangyun Meng,Nathan Hatch,Alexander Lambert,Anqi Li,Nolan Wagener,Matthew Schmittle,JoonHo Lee,Wentao Yuan,Zoey Chen,Samuel Deng,Greg Okopal,Dieter Fox,Byron Boots,Amirreza Shaban

Effective use of camera-based vision systems is essential for robust performance in autonomous off-road driving, particularly in the high-speed regime. Despite success in structured, on-road settings, current end-to-end approaches for scene prediction have yet to be successfully adapted for complex outdoor terrain. To this end, we present TerrainNet, a vision-based terrain perception system for semantic and geometric terrain prediction for aggressive, off-road navigation. The approach relies on several key insights and practical considerations for achieving reliable terrain modeling. The network includes a multi-headed output representation to capture fine- and coarse-grained terrain features necessary for estimating traversability. Accurate depth estimation is achieved using self-supervised depth completion with multi-view RGB and stereo inputs. Requirements for real-time performance and fast inference speeds are met using efficient, learned image feature projections. Furthermore, the model is trained on a large-scale, real-world off-road dataset collected across a variety of diverse outdoor environments. We show how TerrainNet can also be used for costmap prediction and provide a detailed framework for integration into a planning module. We demonstrate the performance of TerrainNet through extensive comparison to current state-of-the-art baselines for camera-only scene prediction. Finally, we showcase the effectiveness of integrating TerrainNet within a complete autonomous-driving stack by conducting a real-world vehicle test in a challenging off-road scenario.

翻译：相机视觉系统的高效运用对于自主非公路驾驶的鲁棒性能至关重要，尤其在高速度场景下。尽管在结构化公路场景中取得成功，当前用于场景预测的端到端方法尚未成功适配复杂户外地形。为此，我们提出TerrainNet——一种基于视觉的地形感知系统，用于激进式非公路导航中的语义与几何地形预测。该方法依赖于若干关键洞见与工程考量以实现可靠地形建模。网络包含多输出头表示结构，以捕捉估算可通过性所需的粗细粒度地形特征。通过采用多视角RGB与立体输入的自监督深度补全技术实现精确深度估计。利用高效学习的图像特征投影满足实时性能与快速推理速度的要求。此外，模型在涵盖多种户外环境的真实大规模非公路数据集上训练。我们展示了TerrainNet如何用于代价地图预测，并提供了集成至规划模块的详细框架。通过与当前最先进的纯相机场景预测基线方法进行广泛对比，验证了TerrainNet的性能。最终，通过在挑战性非公路场景中开展真实车辆测试，展示了将TerrainNet集成至完整自主驾驶系统栈的有效性。