This paper presents Sim-Suction, a robust object-aware suction grasp policy for mobile manipulation platforms with dynamic camera viewpoints, designed to pick up unknown objects from cluttered environments. Suction grasp policies typically employ data-driven approaches, necessitating large-scale, accurately-annotated suction grasp datasets. However, the generation of suction grasp datasets in cluttered environments remains underexplored, leaving uncertainties about the relationship between the object of interest and its surroundings. To address this, we propose a benchmark synthetic dataset, Sim-Suction-Dataset, comprising 500 cluttered environments with 3.2 million annotated suction grasp poses. The efficient Sim-Suction-Dataset generation process provides novel insights by combining analytical models with dynamic physical simulations to create fast and accurate suction grasp pose annotations. We introduce Sim-Suction-Pointnet to generate robust 6D suction grasp poses by learning point-wise affordances from the Sim-Suction-Dataset, leveraging the synergy of zero-shot text-to-segmentation. Real-world experiments for picking up all objects demonstrate that Sim-Suction-Pointnet achieves success rates of 96.76%, 94.23%, and 92.39% on cluttered level 1 objects (prismatic shape), cluttered level 2 objects (more complex geometry), and cluttered mixed objects, respectively. The Sim-Suction policies outperform state-of-the-art benchmarks tested by approximately 21% in cluttered mixed scenes.
翻译:本文提出Sim-Suction,一种面向动态相机视角移动操作平台的鲁棒物体感知吸盘抓取策略,旨在从杂乱环境中抓取未知物体。吸盘抓取策略通常采用数据驱动方法,需要大规模、精确标注的吸盘抓取数据集。然而,杂乱环境下的吸盘抓取数据集生成尚未充分探索,目标物体与其周围环境之间的关系仍存在不确定性。针对这一问题,我们提出一个基准合成数据集Sim-Suction-Dataset,包含500个杂乱环境及320万个标注吸盘抓取位姿。该高效数据集生成过程通过将解析模型与动态物理仿真相结合,为快速准确标注吸盘抓取位姿提供了新见解。我们引入Sim-Suction-Pointnet,通过从Sim-Suction-Dataset学习点的可抓取性来生成鲁棒6D吸盘抓取位姿,并利用零样本文本分割的协同效应。在拾取所有物体的真实实验中,Sim-Suction-Pointnet对杂乱等级1物体(棱柱形)、杂乱等级2物体(更复杂几何形状)及杂乱混合物体的成功率分别达到96.76%、94.23%和92.39%。在杂乱混合场景中,Sim-Suction策略比测试的最先进基准方法性能高出约21%。