Imitation learning aims to learn a policy from observing expert demonstrations without access to reward signals from environments. Generative adversarial imitation learning (GAIL) formulates imitation learning as adversarial learning, employing a generator policy learning to imitate expert behaviors and discriminator learning to distinguish the expert demonstrations from agent trajectories. Despite its encouraging results, GAIL training is often brittle and unstable. Inspired by the recent dominance of diffusion models in generative modeling, this work proposes Diffusion-Reward Adversarial Imitation Learning (DRAIL), which integrates a diffusion model into GAIL, aiming to yield more precise and smoother rewards for policy learning. Specifically, we propose a diffusion discriminative classifier to construct an enhanced discriminator; then, we design diffusion rewards based on the classifier's output for policy learning. We conduct extensive experiments in navigation, manipulation, and locomotion, verifying DRAIL's effectiveness compared to prior imitation learning methods. Moreover, additional experimental results demonstrate the generalizability and data efficiency of DRAIL. Visualized learned reward functions of GAIL and DRAIL suggest that DRAIL can produce more precise and smoother rewards.
翻译:模仿学习旨在通过观察专家演示来学习策略,而无需访问环境中的奖励信号。生成对抗模仿学习(GAIL)将模仿学习表述为对抗学习,采用生成器策略学习来模仿专家行为,并采用判别器学习来区分专家演示与智能体轨迹。尽管取得了令人鼓舞的结果,但GAIL的训练通常脆弱且不稳定。受近期扩散模型在生成建模中主导地位的启发,本研究提出了扩散奖励对抗模仿学习(DRAIL),它将扩散模型集成到GAIL中,旨在为策略学习提供更精确、更平滑的奖励。具体而言,我们提出了一种扩散判别分类器来构建增强的判别器;然后,我们基于该分类器的输出设计了用于策略学习的扩散奖励。我们在导航、操作和运动任务上进行了广泛的实验,验证了DRAIL相较于先前模仿学习方法的有效性。此外,额外的实验结果证明了DRAIL的泛化能力和数据效率。对GAIL和DRAIL所学奖励函数的可视化表明,DRAIL能够产生更精确、更平滑的奖励。