PointPatchRL——掩码重建提升点云强化学习性能 (PointPatchRL -- Masked Reconstruction Improves Reinforcement Learning on Point Clouds)

Perceiving the environment via cameras is crucial for Reinforcement Learning (RL) in robotics. While images are a convenient form of representation, they often complicate extracting important geometric details, especially with varying geometries or deformable objects. In contrast, point clouds naturally represent this geometry and easily integrate color and positional data from multiple camera views. However, while deep learning on point clouds has seen many recent successes, RL on point clouds is under-researched, with only the simplest encoder architecture considered in the literature. We introduce PointPatchRL (PPRL), a method for RL on point clouds that builds on the common paradigm of dividing point clouds into overlapping patches, tokenizing them, and processing the tokens with transformers. PPRL provides significant improvements compared with other point-cloud processing architectures previously used for RL. We then complement PPRL with masked reconstruction for representation learning and show that our method outperforms strong model-free and model-based baselines on image observations in complex manipulation tasks containing deformable objects and variations in target object geometry. Videos and code are available at https://alrhub.github.io/pprl-website

翻译：通过摄像头感知环境对于机器人领域的强化学习至关重要。虽然图像是一种便捷的表示形式，但它们往往使提取重要几何细节变得复杂，特别是在处理几何形状变化或可变形物体时。相比之下，点云天然地表示这种几何结构，并能轻松整合来自多个相机视角的颜色与位置数据。然而，尽管点云深度学习近年来取得诸多进展，点云强化学习的研究仍显不足，现有文献仅考虑了最简单的编码器架构。本文提出PointPatchRL（PPRL），这是一种基于点云的强化学习方法，其构建于将点云分割为重叠块、进行标记化处理并通过Transformer处理标记的通用范式之上。与先前用于强化学习的其他点云处理架构相比，PPRL展现出显著性能提升。我们进一步结合掩码重建进行表征学习，实验表明在包含可变形物体及目标物体几何形态变化的复杂操控任务中，本方法在图像观测条件下优于强模型无关与基于模型的基线方法。演示视频与代码发布于 https://alrhub.github.io/pprl-website

相关内容

点云

关注 50

根据激光测量原理得到的点云，包括三维坐标（XYZ）和激光反射强度（Intensity）。根据摄影测量原理得到的点云，包括三维坐标（XYZ）和颜色信息（RGB）。结合激光测量和摄影测量原理得到点云，包括三维坐标（XYZ）、激光反射强度（Intensity）和颜色信息（RGB）。在获取物体表面每个采样点的空间坐标后，得到的是一个点的集合，称之为“点云”(Point Cloud)

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日