Hierarchical reinforcement learning (HRL) proposes to solve difficult tasks by performing decision-making and control at successively higher levels of temporal abstraction. However, off-policy HRL often suffers from the problem of a non-stationary high-level policy since the low-level policy is constantly changing. In this paper, we propose a novel HRL approach for mitigating the non-stationarity by adversarially enforcing the high-level policy to generate subgoals compatible with the current instantiation of the low-level policy. In practice, the adversarial learning is implemented by training a simple state-conditioned discriminator network concurrently with the high-level policy which determines the compatibility level of subgoals. Comparison to state-of-the-art algorithms shows that our approach improves both learning efficiency and performance in challenging continuous control tasks.
翻译:分层强化学习(HRL)提出通过在不同时间抽象层级上进行决策与控制来解决困难任务。然而,由于低层策略持续更新,离策略HRL常遭受高层策略非平稳性的困扰。本文提出一种新型HRL方法,通过对抗性强制高层策略生成与当前低层策略实例兼容的子目标,从而缓解非平稳性问题。实际实现中,对抗学习通过训练一个简单的状态条件判别器网络(与高层策略并行训练)来实现,该网络判断子目标的兼容性程度。与最先进算法的比较表明,在具有挑战性的连续控制任务中,我们的方法同时提升了学习效率与性能。