In this study, we explore the influence of different observation spaces on robot learning, focusing on three predominant modalities: RGB, RGB-D, and point cloud. Through extensive experimentation on over 17 varied contact-rich manipulation tasks, conducted across two benchmarks and simulators, we have observed a notable trend: point cloud-based methods, even those with the simplest designs, frequently surpass their RGB and RGB-D counterparts in performance. This remains consistent in both scenarios: training from scratch and utilizing pretraining. Furthermore, our findings indicate that point cloud observations lead to improved policy zero-shot generalization in relation to various geometry and visual clues, including camera viewpoints, lighting conditions, noise levels and background appearance. The outcomes suggest that 3D point cloud is a valuable observation modality for intricate robotic tasks. We will open-source all our codes and checkpoints, hoping that our insights can help design more generalizable and robust robotic models.
翻译:在本研究中,我们探讨了不同观测空间对机器人学习的影响,重点关注三种主要模态:RGB、RGB-D和点云。通过在两个基准测试和模拟器上对超过17种不同的接触密集操作任务进行大量实验,我们观察到一个显著趋势:基于点云的方法,即使采用最简单的设计,其性能也往往优于RGB和RGB-D对应方法。这种一致性在从头训练和利用预训练两种场景中均得到保持。此外,我们的研究结果表明,点云观测能够改进策略在多种几何与视觉线索(包括相机视角、光照条件、噪声水平和背景外观)方面的零样本泛化能力。这些结果提示,三维点云是处理复杂机器人任务的一种有价值的观测模态。我们将开源所有代码和检查点,希望我们的见解有助于设计更具泛化能力和鲁棒性的机器人模型。