The problem of path planning has been studied for years. Classic planning pipelines, including perception, mapping, and path searching, can result in latency and compounding errors between modules. While recent studies have demonstrated the effectiveness of end-to-end learning methods in achieving high planning efficiency, these methods often struggle to match the generalization abilities of classic approaches in handling different environments. Moreover, end-to-end training of policies often requires a large number of labeled data or training iterations to reach convergence. In this paper, we present a novel Imperative Learning (IL) approach. This approach leverages a differentiable cost map to provide implicit supervision during policy training, eliminating the need for demonstrations or labeled trajectories. Furthermore, the policy training adopts a Bi-Level Optimization (BLO) process, which combines network update and metric-based trajectory optimization, to generate a smooth and collision-free path toward the goal based on a single depth measurement. The proposed method allows the costs of predicted trajectories and task-level loss to be backpropagated through all layers to update the network with direct gradients. In our experiments, the method demonstrates around 4x faster planning than the classic approach and robustness against localization noise. Additionally, the IL approach enables the planner to generalize to various unseen environments, resulting in an overall 26-87% improvement in performance compared to baseline learning methods.
翻译:路径规划问题已研究多年。经典规划流程(包括感知、建图和路径搜索)可能导致模块间延迟与误差累积。尽管近期研究证明了端到端学习方法在实现高规划效率方面的有效性,但此类方法常难以媲美经典方法在不同环境中处理的泛化能力。此外,端到端策略训练通常需要大量标注数据或训练迭代才能收敛。本文提出一种新型命令式学习(Imperative Learning, IL)方法,利用可微成本图在策略训练中提供隐式监督,无需演示或标注轨迹。策略训练采用双层优化(Bi-Level Optimization, BLO)流程,结合网络更新与基于度量的轨迹优化,根据单次深度测量生成指向目标的平滑无碰撞路径。该方法可使预测轨迹成本与任务级损失通过所有层反向传播,以直接梯度更新网络。实验表明,本方法规划速度约为经典方法的4倍,且对定位噪声具有鲁棒性。此外,IL方法使规划器能够泛化至多种未知环境,与基线学习方法相比,性能总体提升26%~87%。