Imitation learning empowers artificial agents to mimic behavior by learning from demonstrations. Recently, diffusion models, which have the ability to model high-dimensional and multimodal distributions, have shown impressive performance on imitation learning tasks. These models learn to shape a policy by diffusing actions (or states) from standard Gaussian noise. However, the target policy to be learned is often significantly different from Gaussian and this mismatch can result in poor performance when using a small number of diffusion steps (to improve inference speed) and under limited data. The key idea in this work is that initiating from a more informative source than Gaussian enables diffusion methods to mitigate the above limitations. We contribute both theoretical results, a new method, and empirical findings that show the benefits of using an informative source policy. Our method, which we call BRIDGER, leverages the stochastic interpolants framework to bridge arbitrary policies, thus enabling a flexible approach towards imitation learning. It generalizes prior work in that standard Gaussians can still be applied, but other source policies can be used if available. In experiments on challenging simulation benchmarks and on real robots, BRIDGER outperforms state-of-the-art diffusion policies. We provide further analysis on design considerations when applying BRIDGER.
翻译:模仿学习使智能体能够通过观察示范行为来模仿人类行为。近年来,具有高维和多模态分布建模能力的扩散模型在模仿学习任务中展现出卓越性能。这类模型通过从标准高斯噪声中扩散动作(或状态)来学习策略塑造。然而,待学习的目标策略往往与高斯分布存在显著差异,这种不匹配在使用少量扩散步骤(以提升推理速度)和数据有限的情况下会导致性能下降。本文的核心思想在于:相较于高斯分布,从更具信息量的源分布出发能使扩散方法缓解上述局限。我们贡献了理论成果、新方法及实证发现,表明使用信息性源策略的优势。我们提出的方法BRIDGER利用随机插值框架桥接任意策略,从而为模仿学习提供灵活方案。该方法泛化了先前工作:既可沿用标准高斯分布,也可在有条件时选用其他源策略。在具有挑战性的仿真基准测试和真实机器人实验中,BRIDGER的性能超越了最先进的扩散策略。我们还进一步分析了应用BRIDGER时的设计考量。