Imitation learning empowers artificial agents to mimic behavior by learning from demonstrations. Recently, diffusion models, which have the ability to model high-dimensional and multimodal distributions, have shown impressive performance on imitation learning tasks. These models learn to shape a policy by diffusing actions (or states) from standard Gaussian noise. However, the target policy to be learned is often significantly different from Gaussian and this mismatch can result in poor performance when using a small number of diffusion steps (to improve inference speed) and under limited data. The key idea in this work is that initiating from a more informative source than Gaussian enables diffusion methods to overcome the above limitations. We contribute both theoretical results, a new method, and empirical findings that show the benefits of using an informative source policy. Our method, which we call BRIDGER, leverages the stochastic interpolants framework to bridge arbitrary policies, thus enabling a flexible approach towards imitation learning. It generalizes prior work in that standard Gaussians can still be applied, but other source policies can be used if available. In experiments on challenging benchmarks, BRIDGER outperforms state-of-the-art diffusion policies and we provide further analysis on design considerations when applying BRIDGER.
翻译:模仿学习通过从示范中学习,使人工智能体能够模仿行为。近来,具有高维和多模态分布建模能力的扩散模型在模仿学习任务中展现出令人瞩目的性能。这类模型通过从标准高斯噪声中扩散动作(或状态)来学习塑造策略。然而,待学习的目标策略通常与高斯分布存在显著差异,这种不匹配在使用少量扩散步骤(以提升推理速度)和数据受限的情况下会导致性能不佳。本工作的核心思想是:从比高斯分布更具信息性的初始分布出发,使扩散方法能够克服上述局限。我们贡献了理论结果、新方法以及实证发现,均证明了使用信息性源策略的优势。我们提出的方法名为BRIDGER,利用随机插值框架连接任意策略,从而实现了灵活的模仿学习方法。该方法推广了先前工作:仍可应用标准高斯分布,但若存在其他源策略也可使用。在具有挑战性的基准测试实验中,BRIDGER优于最先进的扩散策略,我们并进一步分析了应用BRIDGER时的设计考量。