Imitation learning addresses the challenge of learning by observing an expert's demonstrations without access to reward signals from the environment. Behavioral cloning (BC) formulates imitation learning as a supervised learning problem and learns from sampled state-action pairs. Despite its simplicity, it often fails to capture the temporal structure of the task and the global information of expert demonstrations. This work aims to augment BC by employing diffusion models for modeling expert behaviors, and designing a learning objective that leverages learned diffusion models to guide policy learning. To this end, we propose diffusion model-augmented behavioral cloning (Diffusion-BC) that combines our proposed diffusion model guided learning objective with the BC objective, which complements each other. Our proposed method outperforms baselines or achieves competitive performance in various continuous control domains, including navigation, robot arm manipulation, and locomotion. Ablation studies justify our design choices and investigate the effect of balancing the BC and our proposed diffusion model objective.
翻译:模仿学习旨在通过观察专家的演示来学习,而无需访问环境中的奖励信号。行为克隆(BC)将模仿学习转化为监督学习问题,从采样的状态-动作对中学习。尽管其方法简单,但常常难以捕捉任务的时间结构以及专家演示的全局信息。本文旨在通过利用扩散模型对专家行为进行建模,并设计一种利用所学扩散模型引导策略学习的学习目标,来增强行为克隆。为此,我们提出了扩散模型增强的行为克隆(Diffusion-BC),该方案将我们提出的扩散模型引导学习目标与BC目标相结合,两者互为补充。我们的方法在多种连续控制领域(包括导航、机械臂操作和运动控制)中优于基线方法或达到了具有竞争力的性能。消融研究验证了我们的设计选择,并探究了平衡BC目标与所提出的扩散模型目标的效应。