Imitation learning learns a policy from demonstrations without requiring hand-designed reward functions. In many robotic tasks, such as autonomous racing, imitated policies must model complex environment dynamics and human decision-making. Sequence modeling is highly effective in capturing intricate patterns of motion sequences but struggles to adapt to new environments or distribution shifts that are common in real-world robotics tasks. In contrast, Adversarial Imitation Learning (AIL) can mitigate this effect, but struggles with sample inefficiency and handling complex motion patterns. Thus, we propose BeTAIL: Behavior Transformer Adversarial Imitation Learning, which combines a Behavior Transformer (BeT) policy from human demonstrations with online AIL. BeTAIL adds an AIL residual policy to the BeT policy to model the sequential decision-making process of human experts and correct for out-of-distribution states or shifts in environment dynamics. We test BeTAIL on three challenges with expert-level demonstrations of real human gameplay in Gran Turismo Sport. Our proposed residual BeTAIL reduces environment interactions and improves racing performance and stability, even when the BeT is pretrained on different tracks than downstream learning. Videos and code available at: https://sites.google.com/berkeley.edu/BeTAIL/home.
翻译:模仿学习通过演示学习策略,无需人工设计奖励函数。在自主赛车等机器人任务中,模仿策略必须建模复杂的环境动态和人类决策过程。序列建模能有效捕捉运动序列的复杂模式,但难以适应现实机器人任务中常见的新环境或分布偏移。相比之下,对抗模仿学习(AIL)可缓解此问题,但存在样本效率低及难以处理复杂运动模式的缺陷。为此,我们提出BeTAIL:行为变换器对抗模仿学习,该方法将基于人类演示的行为变换器(BeT)策略与在线AIL相结合。BeTAIL通过向BeT策略添加AIL残差策略,建模人类专家的序列决策过程,并修正环境动态中的分布外状态或偏移。我们使用《Gran Turismo Sport》中真实人类游戏玩法的专家级演示,在三个挑战性任务上测试了BeTAIL。所提出的残差BeTAIL方法减少了环境交互次数,提升了赛车性能与稳定性,即使BeT预训练赛道与下游学习赛道不同时依然有效。视频与代码见:https://sites.google.com/berkeley.edu/BeTAIL/home