Multi-task Imitation Learning (MIL) aims to train a policy capable of performing a distribution of tasks based on multi-task expert demonstrations, which is essential for general-purpose robots. Existing MIL algorithms suffer from low data efficiency and poor performance on complex long-horizontal tasks. We develop Multi-task Hierarchical Adversarial Inverse Reinforcement Learning (MH-AIRL) to learn hierarchically-structured multi-task policies, which is more beneficial for compositional tasks with long horizons and has higher expert data efficiency through identifying and transferring reusable basic skills across tasks. To realize this, MH-AIRL effectively synthesizes context-based multi-task learning, AIRL (an IL approach), and hierarchical policy learning. Further, MH-AIRL can be adopted to demonstrations without the task or skill annotations (i.e., state-action pairs only) which are more accessible in practice. Theoretical justifications are provided for each module of MH-AIRL, and evaluations on challenging multi-task settings demonstrate superior performance and transferability of the multi-task policies learned with MH-AIRL as compared to SOTA MIL baselines.
翻译:多任务模仿学习旨在基于多任务专家演示训练能够执行任务分布的策略,这对于通用机器人至关重要。现有MIL算法存在数据效率低且难以处理复杂长程任务的问题。我们提出多任务分层对抗逆强化学习(MH-AIRL),用于学习具有层次结构的多任务策略。该方法通过跨任务识别与迁移可复用基础技能,更有利于处理长程组合任务,并显著提升专家数据效率。为实现这一目标,MH-AIRL有效融合了基于上下文的多任务学习、AIRL(一种逆强化学习方法)和分层策略学习。此外,MH-AIRL可直接应用于无任务或技能标注的演示数据(即仅含状态-动作对),这些数据在实践中更易获取。我们为MH-AIRL的每个模块提供了理论支撑,在具有挑战性的多任务场景下的评估表明,与当前最优MIL基线方法相比,基于MH-AIRL学习的多任务策略展现出卓越的性能与可迁移性。