This paper introduces Diffusion Policy, a new way of generating robot behavior by representing a robot's visuomotor policy as a conditional denoising diffusion process. We benchmark Diffusion Policy across 11 different tasks from 4 different robot manipulation benchmarks and find that it consistently outperforms existing state-of-the-art robot learning methods with an average improvement of 46.9%. Diffusion Policy learns the gradient of the action-distribution score function and iteratively optimizes with respect to this gradient field during inference via a series of stochastic Langevin dynamics steps. We find that the diffusion formulation yields powerful advantages when used for robot policies, including gracefully handling multimodal action distributions, being suitable for high-dimensional action spaces, and exhibiting impressive training stability. To fully unlock the potential of diffusion models for visuomotor policy learning on physical robots, this paper presents a set of key technical contributions including the incorporation of receding horizon control, visual conditioning, and the time-series diffusion transformer. We hope this work will help motivate a new generation of policy learning techniques that are able to leverage the powerful generative modeling capabilities of diffusion models. Code, data, and training details will be publicly available.
翻译:本文提出扩散策略(Diffusion Policy),这是一种通过将机器人的视觉运动策略表示为条件去噪扩散过程来生成机器人行为的新方法。我们在来自4个不同机器人操作基准测试的11个任务上对扩散策略进行了基准测试,发现它始终优于现有的最先进机器人学习方法,平均改进率为46.9%。扩散策略学习动作分布评分函数的梯度,并在推理过程中通过一系列随机朗之万动力学步骤,相对于该梯度场进行迭代优化。我们发现,扩散公式在用于机器人策略时具有强大优势,包括优雅地处理多模态动作分布、适用于高维动作空间,以及表现出令人印象深刻的训练稳定性。为了在物理机器人上充分释放扩散模型在视觉运动策略学习中的潜力,本文提出了一系列关键技术贡献,包括引入滚动时域控制、视觉条件调节和时间序列扩散变换器。我们希望这项工作能够激发新一代策略学习技术,使其能够利用扩散模型强大的生成建模能力。代码、数据和训练细节将公开发布。