Effective use of camera-based vision systems is essential for robust performance in autonomous off-road driving, particularly in the high-speed regime. Despite success in structured, on-road settings, current end-to-end approaches for scene prediction have yet to be successfully adapted for complex outdoor terrain. To this end, we present TerrainNet, a vision-based terrain perception system for semantic and geometric terrain prediction for aggressive, off-road navigation. The approach relies on several key insights and practical considerations for achieving reliable terrain modeling. The network includes a multi-headed output representation to capture fine- and coarse-grained terrain features necessary for estimating traversability. Accurate depth estimation is achieved using self-supervised depth completion with multi-view RGB and stereo inputs. Requirements for real-time performance and fast inference speeds are met using efficient, learned image feature projections. Furthermore, the model is trained on a large-scale, real-world off-road dataset collected across a variety of diverse outdoor environments. We show how TerrainNet can also be used for costmap prediction and provide a detailed framework for integration into a planning module. We demonstrate the performance of TerrainNet through extensive comparison to current state-of-the-art baselines for camera-only scene prediction. Finally, we showcase the effectiveness of integrating TerrainNet within a complete autonomous-driving stack by conducting a real-world vehicle test in a challenging off-road scenario.
翻译:相机视觉系统的有效利用对于自主越野驾驶的鲁棒性能至关重要,特别是在高速场景下。尽管在结构化道路环境中取得了成功,但现有的端到端场景预测方法尚未能成功适配复杂户外地形。为此,我们提出TerrainNet——一种基于视觉的地形感知系统,用于激进越野导航中的语义与几何地形预测。该方法依赖于若干关键洞察和实际考量以实现可靠的地形建模。网络采用多头输出表示结构,以捕捉估算可通行性所需的粗细粒度地形特征。通过自监督深度补全与多视角RGB及立体输入相结合,实现了精确的深度估计。利用高效的学得图像特征投影,满足了实时性能与快速推理速度的需求。此外,模型在大规模真实越野数据集上进行训练,该数据集覆盖了多种多样的户外环境。我们展示了TerrainNet如何用于代价地图预测,并提供了集成至规划模块的详细框架。通过与当前最先进的纯相机场景预测基线进行广泛对比,验证了TerrainNet的性能。最后,通过在具有挑战性的越野场景中开展真实车辆测试,展示了将TerrainNet集成至完整自主驾驶栈的有效性。