ViPlanner: Visual Semantic Imperative Learning for Local Navigation

Real-time path planning in outdoor environments still challenges modern robotic systems due to differences in terrain traversability, diverse obstacles, and the necessity for fast decision-making. Established approaches have primarily focused on geometric navigation solutions, which work well for structured geometric obstacles but have limitations regarding the semantic interpretation of different terrain types and their affordances. Moreover, these methods fail to identify traversable geometric occurrences, such as stairs. To overcome these issues, we introduce ViPlanner, a learned local path planning approach that generates local plans based on geometric and semantic information. The system is trained using the Imperative Learning paradigm, for which the network weights are optimized end-to-end based on the planning task objective. This optimization uses a differentiable formulation of a semantic costmap, which enables the planner to distinguish between the traversability of different terrains and accurately identify obstacles. The semantic information is represented in 30 classes using an RGB colorspace that can effectively encode the multiple levels of traversability. We show that the planner can adapt to diverse real-world environments without requiring any real-world training. In fact, the planner is trained purely in simulation, enabling a highly scalable training data generation. Experimental results demonstrate resistance to noise, zero-shot sim-to-real transfer, and a decrease of 38.02% in terms of traversability cost compared to purely geometric-based approaches. Code and models are made publicly available: https://github.com/leggedrobotics/viplanner.

翻译：户外环境中的实时路径规划仍对现代机器人系统构成挑战，其原因包括地形可通过性的差异、障碍物多样性以及快速决策的必要性。现有方法主要关注几何导航方案，这类方法对结构化几何障碍物效果良好，但在不同地形类型及其可供性的语义解释方面存在局限性。此外，这些方法无法识别楼梯等可通行的几何结构。为解决上述问题，我们提出ViPlanner——一种基于几何与语义信息生成局部路径规划的习得性局部导航方法。该系统采用即时学习范式进行训练，网络权重根据规划任务目标进行端到端优化。该优化过程使用了可微分的语义代价地图公式，使规划器能够区分不同地形的可通过性并准确识别障碍物。语义信息通过RGB色彩空间以30个类别进行表示，可有效编码多层次的可通过性。实验表明，该规划器无需真实环境训练即可适应多样化的现实场景。事实上，该规划器完全在仿真环境中训练，实现了高度可扩展的训练数据生成。实验结果表明，该方法具有抗噪性、零样本仿真到现实迁移能力，并且与纯几何方法相比，可通过性代价降低了38.02%。代码与模型已公开：https://github.com/leggedrobotics/viplanner。