Imitation learning empowers artificial agents to mimic behavior by learning from demonstrations. Recently, diffusion models, which have the ability to model high-dimensional and multimodal distributions, have shown impressive performance on imitation learning tasks. These models learn to shape a policy by diffusing actions (or states) from standard Gaussian noise. However, the target policy to be learned is often significantly different from Gaussian and this mismatch can result in poor performance when using a small number of diffusion steps (to improve inference speed) and under limited data. The key idea in this work is that initiating from a more informative source than Gaussian enables diffusion methods to mitigate the above limitations. We contribute both theoretical results, a new method, and empirical findings that show the benefits of using an informative source policy. Our method, which we call BRIDGER, leverages the stochastic interpolants framework to bridge arbitrary policies, thus enabling a flexible approach towards imitation learning. It generalizes prior work in that standard Gaussians can still be applied, but other source policies can be used if available. In experiments on challenging simulation benchmarks and on real robots, BRIDGER outperforms state-of-the-art diffusion policies. We provide further analysis on design considerations when applying BRIDGER. https://clear-nus.github.io/blog/bridger
翻译:模仿学习使智能体能够通过从演示中学习来模拟行为。近年来,扩散模型因其能够建模高维多模态分布,在模仿学习任务中展现出卓越性能。这些模型通过学习将标准高斯噪声扩散为动作(或状态)来塑造策略。然而,待学习的目标策略通常与高斯分布存在显著差异,这种不匹配在使用少量扩散步骤(以提高推理速度)及数据有限时可能导致性能下降。本工作的核心思想是:从比高斯分布更具信息量的源策略出发,可使扩散方法缓解上述局限。我们提出了理论结果、新方法及实证发现,共同证明了使用信息性源策略的优势。我们的方法称为BRIDGER,它利用随机插值框架桥接任意策略,从而为模仿学习提供灵活途径。该方法推广了先前工作:标准高斯分布仍可应用,但若可获得其他源策略亦可使用。在具有挑战性的仿真基准测试和真实机器人实验中,BRIDGER均优于最先进的扩散策略。我们进一步分析了应用BRIDGER时的设计考量。https://clear-nus.github.io/blog/bridger