Hierarchical Imitation Learning (HIL) has been proposed to recover highly-complex behaviors in long-horizon tasks from expert demonstrations by modeling the task hierarchy with the option framework. Existing methods either overlook the causal relationship between the subtask and its corresponding policy or cannot learn the policy in an end-to-end fashion, which leads to suboptimality. In this work, we develop a novel HIL algorithm based on Adversarial Inverse Reinforcement Learning and adapt it with the Expectation-Maximization algorithm in order to directly recover a hierarchical policy from the unannotated demonstrations. Further, we introduce a directed information term to the objective function to enhance the causality and propose a Variational Autoencoder framework for learning with our objectives in an end-to-end fashion. Theoretical justifications and evaluations on challenging robotic control tasks are provided to show the superiority of our algorithm. The codes are available at https://github.com/LucasCJYSDL/HierAIRL.
翻译:层次化模仿学习(HIL)通过利用选项框架建模任务层次结构,从专家示范中恢复长时域任务中的高度复杂行为。现有方法要么忽略了子任务与其对应策略之间的因果关系,要么无法以端到端的方式学习策略,从而导致次优性。本文基于对抗性逆强化学习提出了一种新颖的HIL算法,并采用期望最大化算法进行适配,以直接从未标注示范中恢复层次化策略。此外,我们在目标函数中引入了一项有向信息项以增强因果关系,并提出了一个变分自编码器框架,以端到端的方式基于目标函数进行学习。我们提供了理论证明及在具有挑战性的机器人控制任务上的评估,展示了该算法的优越性。代码请见 https://github.com/LucasCJYSDL/HierAIRL。