Accurate grasping is the key to several robotic tasks including assembly and household robotics. Executing a successful grasp in a cluttered environment requires multiple levels of scene understanding: First, the robot needs to analyze the geometric properties of individual objects to find feasible grasps. These grasps need to be compliant with the local object geometry. Second, for each proposed grasp, the robot needs to reason about the interactions with other objects in the scene. Finally, the robot must compute a collision-free grasp trajectory while taking into account the geometry of the target object. Most grasp detection algorithms directly predict grasp poses in a monolithic fashion, which does not capture the composability of the environment. In this paper, we introduce an end-to-end architecture for object-centric grasping. The method uses pointcloud data from a single arbitrary viewing direction as an input and generates an instance-centric representation for each partially observed object in the scene. This representation is further used for object reconstruction and grasp detection in cluttered table-top scenes. We show the effectiveness of the proposed method by extensively evaluating it against state-of-the-art methods on synthetic datasets, indicating superior performance for grasping and reconstruction. Additionally, we demonstrate real-world applicability by decluttering scenes with varying numbers of objects.
翻译:精确抓取是包括装配和家庭机器人学在内的多项机器人任务的关键。在杂乱环境中成功执行抓取需要多层次的场景理解:首先,机器人需要分析单个物体的几何特性以找到可行的抓取点,这些抓取点需符合局部物体几何结构。其次,对于每个提出的抓取方案,机器人需要推理其与场景中其他物体的相互作用。最后,机器人必须考虑目标物体的几何结构,计算出一条无碰撞的抓取轨迹。大多数抓取检测算法以整体方式直接预测抓取姿态,未能捕捉环境的可组合性。本文提出了一种面向物体中心的端到端架构。该方法以单个任意视角的点云数据作为输入,为场景中每个部分观测的物体生成实例中心表示。该表示进一步用于杂乱桌面场景中的物体重建和抓取检测。我们通过在合成数据集上对提出的方法与最先进方法进行广泛评估,展示了该方法在抓取和重建任务中的优越性能。此外,我们通过整理不同数量物体的场景,证明了其在真实世界中的适用性。