Imitation Learning (IL) enables agents to mimic expert behavior by learning from demonstrations. However, traditional IL methods require large amounts of medium-to-high-quality demonstrations as well as actions of expert demonstrations, both of which are often unavailable. To reduce this need, we propose Latent Wasserstein Adversarial Imitation Learning (LWAIL), a novel adversarial imitation learning framework that focuses on state-only distribution matching. It benefits from the Wasserstein distance computed in a dynamics-aware latent space. This dynamics-aware latent space differs from prior work and is obtained via a pre-training stage, where we train the Intention Conditioned Value Function (ICVF) to capture a dynamics-aware structure of the state space using a small set of randomly generated state-only data. We show that this enhances the policy's understanding of state transitions, enabling the learning process to use only one or a few state-only expert episodes to achieve expert-level performance. Through experiments on multiple MuJoCo environments, we demonstrate that our method outperforms prior Wasserstein-based IL methods and prior adversarial IL methods, achieving better results across various tasks.
翻译:模仿学习(Imitation Learning, IL)使智能体能够通过从专家示范中学习来模仿专家行为。然而,传统的模仿学习方法需要大量中高质量的示范数据以及专家示范的动作信息,这两者在现实中往往难以获得。为了降低这一需求,我们提出了潜在Wasserstein对抗模仿学习(Latent Wasserstein Adversarial Imitation Learning, LWAIL),这是一种新颖的对抗模仿学习框架,专注于仅状态分布的匹配。该框架利用了在动态感知潜在空间中计算的Wasserstein距离。这一动态感知潜在空间不同于先前的工作,它是通过一个预训练阶段获得的,在该阶段我们训练意图条件价值函数(Intention Conditioned Value Function, ICVF),利用一小部分随机生成的仅状态数据来捕捉状态空间的动态感知结构。我们证明,这增强了策略对状态转移的理解,使得学习过程仅需使用一个或少数几个仅状态的专家轨迹即可达到专家级性能。通过在多个MuJoCo环境中的实验,我们证明了我们的方法优于先前基于Wasserstein的模仿学习方法以及先前的对抗模仿学习方法,在各种任务中取得了更好的结果。