Hierarchical reinforcement learning (HRL) incorporates temporal abstraction into reinforcement learning (RL) by explicitly taking advantage of hierarchical structure. Modern HRL typically designs a hierarchical agent composed of a high-level policy and low-level policies. The high-level policy selects which low-level policy to activate at a lower frequency and the activated low-level policy selects an action at each time step. Recent HRL algorithms have achieved performance gains over standard RL algorithms in synthetic navigation tasks. However, we cannot apply these HRL algorithms to real-world navigation tasks. One of the main challenges is that real-world navigation tasks require an agent to perform safe and interactive behaviors in dynamic environments. In this paper, we propose imagination-augmented HRL (IAHRL) that efficiently integrates imagination into HRL to enable an agent to learn safe and interactive behaviors in real-world navigation tasks. Imagination is to predict the consequences of actions without interactions with actual environments. The key idea behind IAHRL is that the low-level policies imagine safe and structured behaviors, and then the high-level policy infers interactions with surrounding objects by interpreting the imagined behaviors. We also introduce a new attention mechanism that allows our high-level policy to be permutation-invariant to the order of surrounding objects and to prioritize our agent over them. To evaluate IAHRL, we introduce five complex urban driving tasks, which are among the most challenging real-world navigation tasks. The experimental results indicate that IAHRL enables an agent to perform safe and interactive behaviors, achieving higher success rates and lower average episode steps than baselines.
翻译:分层强化学习(HRL)通过显式利用分层结构将时间抽象融入强化学习(RL)。现代HRL通常设计一种由高层策略和低层策略构成的分层智能体:高层策略以较低频率选择激活哪个低层策略,而被激活的低层策略在每个时间步选择具体动作。近期HRL算法在合成导航任务中取得了优于标准RL算法的性能提升,然而这些算法无法直接应用于真实导航任务。主要挑战在于真实导航任务要求智能体在动态环境中展现安全且交互的行为。本文提出想象增强分层强化学习(IAHRL),通过将想象高效集成到HRL中,使智能体能够在真实导航任务中学习安全与交互行为。想象是指无需与实际环境交互即可预测动作后果的能力。IAHRL的核心思想在于:低层策略想象安全且结构化的行为,高层策略则通过解读这些想象行为来推断与周围物体的交互。我们同时引入一种新型注意力机制,使高层策略能够对周围物体的顺序保持置换不变性,并优先关注本智能体。为评估IAHRL,我们设计了五个复杂的城市驾驶任务——这些任务属于最具挑战性的真实导航任务之列。实验结果表明,IAHRL使智能体能够展现安全且交互的行为,相较于基线方法实现了更高的成功率与更低的平均回合步数。