Imitation learning from demonstrations (ILD) aims to alleviate numerous shortcomings of reinforcement learning through the use of demonstrations. However, in most real-world applications, expert action guidance is absent, making the use of ILD impossible. Instead, we consider imitation learning from observations (ILO), where no expert actions are provided, making it a significantly more challenging problem to address. Existing methods often employ on-policy learning, which is known to be sample-costly. This paper presents SEILO, a novel sample-efficient on-policy algorithm for ILO, that combines standard adversarial imitation learning with inverse dynamics modeling. This approach enables the agent to receive feedback from both the adversarial procedure and a behavior cloning loss. We empirically demonstrate that our proposed algorithm requires fewer interactions with the environment to achieve expert performance compared to other state-of-the-art on-policy ILO and ILD methods.
翻译:从示范模仿学习旨在通过使用示范来缓解强化学习的诸多缺陷。然而,在大多数现实应用中,缺乏专家动作引导,使得使用示范模仿学习不可行。相反,我们考虑从观测中进行模仿学习,其中不提供专家动作,因此这是一个更具挑战性的问题。现有方法通常采用在线学习,但已知其样本成本较高。本文提出了SEILO,一种新颖的基于观测的高效样本在线模仿学习算法,该算法将标准对抗性模仿学习与逆向动力学建模相结合。这种方法使智能体能够从对抗过程和行为克隆损失中同时获得反馈。我们通过实验证明,与最先进的基于观测的在线模仿学习和示范模仿学习方法相比,我们提出的算法需要与环境进行更少的交互就能达到专家水平的表现。