The problem of path planning has been studied for years. Classic planning pipelines, including perception, mapping, and path searching, can result in latency and compounding errors between modules. While recent studies have demonstrated the effectiveness of end-to-end learning methods in achieving high planning efficiency, these methods often struggle to match the generalization abilities of classic approaches in handling different environments. Moreover, end-to-end training of policies often requires a large number of labeled data or training iterations to reach convergence. In this paper, we present a novel Imperative Learning (IL) approach. This approach leverages a differentiable cost map to provide implicit supervision during policy training, eliminating the need for demonstrations or labeled trajectories. Furthermore, the policy training adopts a Bi-Level Optimization (BLO) process, which combines network update and metric-based trajectory optimization, to generate a smooth and collision-free path toward the goal based on a single depth measurement. The proposed method allows task-level costs of predicted trajectories to be backpropagated through all components to update the network through direct gradient descent. In our experiments, the method demonstrates around 4x faster planning than the classic approach and robustness against localization noise. Additionally, the IL approach enables the planner to generalize to various unseen environments, resulting in an overall 26-87% improvement in SPL performance compared to baseline learning methods.
翻译:路径规划问题已有多年的研究历史。经典的规划流水线(包括感知、建图和路径搜索)可能导致模块间的延迟与累加误差。尽管近期研究已证实端到端学习方法在实现高效规划方面的有效性,但这些方法在应对不同环境时往往难以匹敌经典方法的泛化能力。此外,策略的端到端训练通常需要大量标注数据或多次训练迭代才能收敛。本文提出一种新颖的命令式学习(Imperative Learning, IL)方法。该方法利用可微成本图在策略训练过程中提供隐式监督,从而无需依赖演示或标注轨迹。进一步地,策略训练采用双层优化(Bi-Level Optimization, BLO)流程,将网络更新与基于度量指标的轨迹优化相结合,基于单次深度测量生成通向目标的平滑且无碰撞路径。所提方法允许预测轨迹的任务级成本通过所有组件反向传播,以直接梯度下降方式更新网络。实验结果表明,该方法规划速度约为经典方法的4倍,且对定位噪声具有鲁棒性。此外,IL方法使规划器能够泛化至多种未见环境,相较于基线学习方法,SPL性能整体提升26%-87%。