Hierarchical Imitation Learning (HIL) has been proposed to recover highly-complex behaviors in long-horizon tasks from expert demonstrations by modeling the task hierarchy with the option framework. Existing methods either overlook the causal relationship between the subtask and its corresponding policy or cannot learn the policy in an end-to-end fashion, which leads to suboptimality. In this work, we develop a novel HIL algorithm based on Adversarial Inverse Reinforcement Learning and adapt it with the Expectation-Maximization algorithm in order to directly recover a hierarchical policy from the unannotated demonstrations. Further, we introduce a directed information term to the objective function to enhance the causality and propose a Variational Autoencoder framework for learning with our objectives in an end-to-end fashion. Theoretical justifications and evaluations on challenging robotic control tasks are provided to show the superiority of our algorithm. The codes are available at https://github.com/LucasCJYSDL/HierAIRL.
翻译:分层模仿学习(HIL)通过利用选项框架对任务层次进行建模,旨在从专家示范中恢复长时间跨度任务中的高度复杂行为。现有方法要么忽略了子任务与其对应策略之间的因果关系,要么无法以端到端方式学习策略,从而导致次优性能。本文基于对抗式逆强化学习提出了一种新颖的HIL算法,并通过期望最大化算法进行改进,以直接从无标注的示范中恢复分层策略。进一步地,我们在目标函数中引入定向信息项以增强因果关系,并提出了变分自编码器框架以实现端到端的联合优化学习。本文提供了理论证明以及在具有挑战性的机器人控制任务上的评估结果,展示了所提算法的优越性。代码已开源至 https://github.com/LucasCJYSDL/HierAIRL。