In robot learning, the observation space is crucial due to the distinct characteristics of different modalities, which can potentially become a bottleneck alongside policy design. In this study, we explore the influence of various observation spaces on robot learning, focusing on three predominant modalities: RGB, RGB-D, and point cloud. We introduce OBSBench, a benchmark comprising two simulators and 125 tasks, along with standardized pipelines for various encoders and policy baselines. Extensive experiments on diverse contact-rich manipulation tasks reveal a notable trend: point cloud-based methods, even those with the simplest designs, frequently outperform their RGB and RGB-D counterparts. This trend persists in both scenarios: training from scratch and utilizing pre-training. Furthermore, our findings demonstrate that point cloud observations often yield better policy performance and significantly stronger generalization capabilities across various geometric and visual conditions. These outcomes suggest that the 3D point cloud is a valuable observation modality for intricate robotic tasks. We also suggest that incorporating both appearance and coordinate information can enhance the performance of point cloud methods. We hope our work provides valuable insights and guidance for designing more generalizable and robust robotic models. Codes are available at https://github.com/HaoyiZhu/PointCloudMatters.
翻译:在机器人学习中,观测空间至关重要,因为不同模态具有各自独特的特性,这可能与策略设计共同成为性能瓶颈。本研究探讨了不同观测空间对机器人学习的影响,重点关注三种主流模态:RGB、RGB-D 和点云。我们提出了 OBSBench 基准测试,包含两个模拟器和 125 项任务,并为各类编码器和策略基线提供了标准化流程。在多样化的接触式操作任务上进行的大量实验揭示了一个显著趋势:基于点云的方法,即使采用最简单的设计,也经常优于 RGB 和 RGB-D 方法。这一趋势在两种场景中均持续存在:从零开始训练和利用预训练。此外,我们的研究结果表明,点云观测通常能带来更好的策略性能,并在各种几何和视觉条件下展现出明显更强的泛化能力。这些结果表明,三维点云是处理复杂机器人任务的一种有价值的观测模态。我们还指出,结合外观信息和坐标信息可以进一步提升点云方法的性能。我们希望这项工作能为设计更具泛化能力和鲁棒性的机器人模型提供有价值的见解和指导。代码发布于 https://github.com/HaoyiZhu/PointCloudMatters。