Learning whom to trust in navigation: dynamically switching between classical and neural planning

Navigation of terrestrial robots is typically addressed either with localization and mapping (SLAM) followed by classical planning on the dynamically created maps, or by machine learning (ML), often through end-to-end training with reinforcement learning (RL) or imitation learning (IL). Recently, modular designs have achieved promising results, and hybrid algorithms that combine ML with classical planning have been proposed. Existing methods implement these combinations with hand-crafted functions, which cannot fully exploit the complementary nature of the policies and the complex regularities between scene structure and planning performance. Our work builds on the hypothesis that the strengths and weaknesses of neural planners and classical planners follow some regularities, which can be learned from training data, in particular from interactions. This is grounded on the assumption that, both, trained planners and the mapping algorithms underlying classical planning are subject to failure cases depending on the semantics of the scene and that this dependence is learnable: for instance, certain areas, objects or scene structures can be reconstructed easier than others. We propose a hierarchical method composed of a high-level planner dynamically switching between a classical and a neural planner. We fully train all neural policies in simulation and evaluate the method in both simulation and real experiments with a LoCoBot robot, showing significant gains in performance, in particular in the real environment. We also qualitatively conjecture on the nature of data regularities exploited by the high-level planner.

翻译：地面机器人的导航通常通过定位与地图构建（SLAM）随后在动态创建地图上进行经典规划来实现，或通过机器学习（ML）实现，通常采用强化学习（RL）或模仿学习（IL）进行端到端训练。近年来，模块化设计取得了有前景的成果，并提出了将ML与经典规划相结合的混合算法。现有方法通过手工设计的函数实现这些组合，无法充分利用策略的互补性以及场景结构与规划性能之间的复杂规律。我们的工作基于以下假设：神经规划器与经典规划器的优势和劣势遵循某种规律，这些规律可从训练数据中学习，特别是从交互中学习。这一假设基于如下认识：经过训练的规划器和经典规划所依赖的映射算法均会因场景语义而失败，且这种依赖性是可学习的——例如，某些区域、物体或场景结构比其他更易重建。我们提出一种分层方法，该方法由高层规划器动态切换经典规划器与神经规划器组成。我们在仿真中完全训练所有神经策略，并使用LoCoBot机器人在仿真和真实实验中评估该方法，结果显示在性能上有显著提升，尤其在真实环境中。我们还定性推测了高层规划器所利用的数据规律的本质。