For robotic vehicles to navigate robustly and safely in unseen environments, it is crucial to decide the most suitable navigation policy. However, most existing deep reinforcement learning based navigation policies are trained with a hand-engineered curriculum and reward function which are difficult to be deployed in a wide range of real-world scenarios. In this paper, we propose a framework to learn a family of low-level navigation policies and a high-level policy for deploying them. The main idea is that, instead of learning a single navigation policy with a fixed reward function, we simultaneously learn a family of policies that exhibit different behaviors with a wide range of reward functions. We then train the high-level policy which adaptively deploys the most suitable navigation skill. We evaluate our approach in simulation and the real world and demonstrate that our method can learn diverse navigation skills and adaptively deploy them. We also illustrate that our proposed hierarchical learning framework presents explainability by providing semantics for the behavior of an autonomous agent.
翻译:为保障无人车在未知环境中实现鲁棒且安全的导航,关键在于选择最合适的导航策略。然而,现有基于深度强化学习的导航策略大多采用手工设计的课程学习与奖励函数,难以广泛部署于真实场景。本文提出了一种框架,用于学习一组底层导航策略及一个用于部署这些策略的高层策略。核心思想在于:并非通过固定奖励函数学习单一导航策略,而是同步学习一组具有不同行为表现且对应多种奖励函数的策略族。随后训练一个高层策略,使其能自适应地部署最合适的导航技能。我们在仿真环境与真实世界中评估了该方法,结果表明本方法可学习多样化的导航技能并实现自适应部署。此外,我们阐明所提出的分层学习框架通过为自主智能体的行为提供语义解释,展现出可解释性。