For robotic vehicles to navigate robustly and safely in unseen environments, it is crucial to decide the most suitable navigation policy. However, most existing deep reinforcement learning based navigation policies are trained with a hand-engineered curriculum and reward function which are difficult to be deployed in a wide range of real-world scenarios. In this paper, we propose a framework to learn a family of low-level navigation policies and a high-level policy for deploying them. The main idea is that, instead of learning a single navigation policy with a fixed reward function, we simultaneously learn a family of policies that exhibit different behaviors with a wide range of reward functions. We then train the high-level policy which adaptively deploys the most suitable navigation skill. We evaluate our approach in simulation and the real world and demonstrate that our method can learn diverse navigation skills and adaptively deploy them. We also illustrate that our proposed hierarchical learning framework presents explainability by providing semantics for the behavior of an autonomous agent.
翻译:为使机器人车辆在未知环境中稳健、安全地导航,选择最合适的导航策略至关重要。然而,现有基于深度强化学习的导航策略大多通过手工设计的课程和奖励函数进行训练,难以在广泛的真实场景中部署。本文提出一个框架,用于学习一组低级导航策略及一个高级策略来部署这些策略。核心思想是:不采用固定奖励函数学习单一导航策略,而是同时学习一组具有不同行为且覆盖广泛奖励函数的策略。随后训练高级策略,使其自适应地部署最合适的导航技能。我们在仿真和真实环境中评估了该方法,证明其能够学习多样化的导航技能并自适应地部署。同时,我们阐明所提出的层次化学习框架通过为自主智能体行为提供语义解释,具备可解释性。