Recent work has shown diffusion models are an effective approach to learning the multimodal distributions arising from demonstration data in behavior cloning. However, a drawback of this approach is the need to learn a denoising function, which is significantly more complex than learning an explicit policy. In this work, we propose Equivariant Diffusion Policy, a novel diffusion policy learning method that leverages domain symmetries to obtain better sample efficiency and generalization in the denoising function. We theoretically analyze the $\mathrm{SO}(2)$ symmetry of full 6-DoF control and characterize when a diffusion model is $\mathrm{SO}(2)$-equivariant. We furthermore evaluate the method empirically on a set of 12 simulation tasks in MimicGen, and show that it obtains a success rate that is, on average, 21.9% higher than the baseline Diffusion Policy. We also evaluate the method on a real-world system to show that effective policies can be learned with relatively few training samples, whereas the baseline Diffusion Policy cannot.
翻译:近期研究表明,扩散模型是学习行为克隆中示范数据所呈现多峰分布的有效方法。然而,该方法存在一个缺点:需要学习去噪函数,这比学习显式策略要复杂得多。在本研究中,我们提出等变扩散策略,这是一种新颖的扩散策略学习方法,它利用领域对称性来提高去噪函数的样本效率和泛化能力。我们从理论上分析了完整六自由度控制的$\mathrm{SO}(2)$对称性,并刻画了扩散模型何时具有$\mathrm{SO}(2)$等变性。此外,我们在MimicGen中的12个仿真任务上对该方法进行了实证评估,结果表明其平均成功率比基准扩散策略高出21.9%。我们还在真实世界系统上评估了该方法,证明其能够用相对较少的训练样本学习有效策略,而基准扩散策略则无法实现。