Imitation learning provides an efficient way to teach robots dexterous skills; however, learning complex skills robustly and generalizablely usually consumes large amounts of human demonstrations. To tackle this challenging problem, we present 3D Diffusion Policy (DP3), a novel visual imitation learning approach that incorporates the power of 3D visual representations into diffusion policies, a class of conditional action generative models. The core design of DP3 is the utilization of a compact 3D visual representation, extracted from sparse point clouds with an efficient point encoder. In our experiments involving 72 simulation tasks, DP3 successfully handles most tasks with just 10 demonstrations and surpasses baselines with a 24.2% relative improvement. In 4 real robot tasks, DP3 demonstrates precise control with a high success rate of 85%, given only 40 demonstrations of each task, and shows excellent generalization abilities in diverse aspects, including space, viewpoint, appearance, and instance. Interestingly, in real robot experiments, DP3 rarely violates safety requirements, in contrast to baseline methods which frequently do, necessitating human intervention. Our extensive evaluation highlights the critical importance of 3D representations in real-world robot learning. Videos, code, and data are available on https://3d-diffusion-policy.github.io .
翻译:模仿学习为机器人灵巧技能教学提供了高效途径;然而,稳健且泛化地学习复杂技能通常需要消耗大量人类演示数据。为解决这一挑战性问题,我们提出了3D扩散策略(DP3),这是一种新颖的视觉模仿学习方法,将3D视觉表征的能力融入扩散策略(一类条件动作生成模型)。DP3的核心设计在于利用紧凑的3D视觉表征,该表征通过高效的点编码器从稀疏点云中提取。在涉及72个仿真任务的实验中,DP3仅需10次演示便能成功处理大多数任务,并以24.2%的相对提升幅度超越基线方法。在4项真实机器人任务中,DP3在每项任务仅提供40次演示的条件下,展现出精准的控制能力,成功率达到85%,并在空间、视角、外观和实例等多个维度表现出卓越的泛化能力。有趣的是,在真实机器人实验中,DP3极少违反安全要求,而基线方法则频繁违规,需要人类干预。我们广泛的评估凸显了3D表征在真实世界机器人学习中的关键重要性。视频、代码和数据可在https://3d-diffusion-policy.github.io获取。