Imitation learning aims to solve the problem of defining reward functions in real-world decision-making tasks. The current popular approach is the Adversarial Imitation Learning (AIL) framework, which matches expert state-action occupancy measures to obtain a surrogate reward for forward reinforcement learning. However, the traditional discriminator is a simple binary classifier and doesn't learn an accurate distribution, which may result in failing to identify expert-level state-action pairs induced by the policy interacting with the environment. To address this issue, we propose a method named diffusion adversarial imitation learning (DiffAIL), which introduces the diffusion model into the AIL framework. Specifically, DiffAIL models the state-action pairs as unconditional diffusion models and uses diffusion loss as part of the discriminator's learning objective, which enables the discriminator to capture better expert demonstrations and improve generalization. Experimentally, the results show that our method achieves state-of-the-art performance and significantly surpasses expert demonstration on two benchmark tasks, including the standard state-action setting and state-only settings. Our code can be available at the link https://github.com/ML-Group-SDU/DiffAIL.
翻译:模仿学习旨在解决现实世界决策任务中奖励函数的定义问题。当前主流方法是基于对抗的模仿学习(AIL)框架,通过匹配专家状态-动作占用度量,为前向强化学习获取替代奖励。然而,传统判别器作为简单的二元分类器,无法学习精确的数据分布,可能导致难以识别由策略与环境交互产生的专家级状态-动作对。为解决该问题,我们提出一种名为扩散对抗模仿学习(DiffAIL)的方法,将扩散模型引入AIL框架。具体而言,DiffAIL将状态-动作对建模为无条件扩散模型,并将扩散损失作为判别器学习目标的一部分,从而使判别器能更精准地捕捉专家示范,并提升泛化能力。实验结果表明,在标准状态-动作设置与仅状态设置这两类基准任务中,我们的方法取得了最先进的性能,且显著超越专家示范。代码可于 https://github.com/ML-Group-SDU/DiffAIL 获取。