Imitation learning provides an efficient way to teach robots dexterous skills; however, learning complex skills robustly and generalizablely usually consumes large amounts of human demonstrations. To tackle this challenging problem, we present 3D Diffusion Policy (DP3), a novel visual imitation learning approach that incorporates the power of 3D visual representations into diffusion policies, a class of conditional action generative models. The core design of DP3 is the utilization of a compact 3D visual representation, extracted from sparse point clouds with an efficient point encoder. In our experiments involving 72 simulation tasks, DP3 successfully handles most tasks with just 10 demonstrations and surpasses baselines with a 55.3% relative improvement. In 4 real robot tasks, DP3 demonstrates precise control with a high success rate of 85%, given only 40 demonstrations of each task, and shows excellent generalization abilities in diverse aspects, including space, viewpoint, appearance, and instance. Interestingly, in real robot experiments, DP3 rarely violates safety requirements, in contrast to baseline methods which frequently do, necessitating human intervention. Our extensive evaluation highlights the critical importance of 3D representations in real-world robot learning. Videos, code, and data are available on https://3d-diffusion-policy.github.io .
翻译:模仿学习为教授机器人灵巧技能提供了高效途径;然而,要鲁棒且泛化地学习复杂技能通常需要大量人类演示数据。为解决这一难题,我们提出3D扩散策略(3D Diffusion Policy,DP3)——一种新颖的视觉模仿学习方法,将三维视觉表征能力融入扩散策略(一类条件动作生成模型)中。DP3的核心设计是利用紧凑的三维视觉表征,该表征通过高效的点云编码器从稀疏点云中提取。在涉及72项仿真任务的实验中,DP3仅需10次演示即可成功处理大多数任务,并以55.3%的相对提升幅度超越基线方法。在4项真实机器人任务中,每项任务仅需40次演示,DP3即展现出85%的高成功率精准控制能力,并在空间、视角、外观及实例等多维度表现出卓越的泛化能力。值得注意的是,在真实机器人实验中,DP3几乎不违反安全约束,而基线方法则频繁触发安全违规需要人为干预。我们的综合评估凸显了三维表征在真实世界机器人学习中的关键重要性。相关视频、代码及数据均发布于https://3d-diffusion-policy.github.io 。