Multi-task Imitation Learning (MIL) aims to train a policy capable of performing a distribution of tasks, which is essential for general-purpose robots, based on multi-task expert demonstrations. Existing MIL algorithms suffer from low data efficiency and poor performance on complex long-horizontal tasks. We develop Multi-task Hierarchical Adversarial Inverse Reinforcement Learning (MH-AIRL) to learn hierarchically-structured multi-task policies, which is more beneficial for compositional tasks with long horizons and has higher expert data efficiency through identifying and transferring reusable basic skills across tasks. To realize this, MH-AIRL effectively synthesizes context-based multi-task learning, AIRL (an IL approach), and hierarchical policy learning. Further, MH-AIRL can be adopted to demonstrations without the task or skill annotations (i.e., state-action pairs only) which are more accessible in practice. Theoretical justifications are provided for each module of MH-AIRL, and evaluations on challenging multi-task settings demonstrate superior performance and transferability of the multi-task policies learned with MH-AIRL as compared to SOTA MIL baselines.
翻译:多任务模仿学习(MIL)旨在基于多任务专家演示训练出能够执行任务分布的策略,这对通用型机器人至关重要。现有MIL算法存在数据效率低、在复杂长跨度任务上性能差的问题。我们提出了多任务分层对抗逆向强化学习(MH-AIRL)算法,用于学习层次化结构的多任务策略。该算法通过识别与跨任务迁移可复用的基本技能,不仅能更有效地处理长跨度组合任务,还能显著提升专家数据利用效率。为达成此目标,MH-AIRL融合了基于情境的多任务学习、AIRL(一种逆向强化学习方法)以及分层策略学习。特别地,MH-AIRL可应用于缺乏任务或技能标注的演示数据(即仅包含状态-动作对),这类数据在实际中更易获取。本文对MH-AIRL各模块提供了理论证明,并在具有挑战性的多任务场景中进行了评估。与当前最先进的MIL基线相比,基于MH-AIRL学习的多任务策略展现了卓越的性能与迁移能力。