Recent advances in predicting 6D grasp poses from a single depth image have led to promising performance in robotic grasping. However, previous grasping models face challenges in cluttered environments where nearby objects impact the target object's grasp. In this paper, we first establish a new benchmark dataset for TARget-driven Grasping under Occlusions, named TARGO. We make the following contributions: 1) We are the first to study the occlusion level of grasping. 2) We set up an evaluation benchmark consisting of large-scale synthetic data and part of real-world data, and we evaluated five grasp models and found that even the current SOTA model suffers when the occlusion level increases, leaving grasping under occlusion still a challenge. 3) We also generate a large-scale training dataset via a scalable pipeline, which can be used to boost the performance of grasping under occlusion and generalized to the real world. 4) We further propose a transformer-based grasping model involving a shape completion module, termed TARGO-Net, which performs most robustly as occlusion increases. Our benchmark dataset can be found at https://TARGO-benchmark.github.io/.
翻译:近年来,基于单张深度图像预测六维抓取位姿的研究取得了显著进展,在机器人抓取任务中展现出良好性能。然而,在杂乱环境中,邻近物体对目标物体的抓取产生干扰,现有抓取模型面临严峻挑战。本文首次建立了面向遮挡条件下目标驱动抓取任务的新基准数据集,命名为TARGO。我们的贡献包括:1)首次系统研究抓取任务中的遮挡等级问题;2)构建了包含大规模合成数据与部分真实数据集的评估基准,并对五种抓取模型进行测试,发现即使当前最先进的模型在遮挡等级增加时性能也会显著下降,表明遮挡条件下的抓取仍是亟待解决的难题;3)通过可扩展流程生成大规模训练数据集,该数据集可用于提升遮挡条件下的抓取性能并实现向真实场景的泛化;4)进一步提出一种融合形状补全模块的基于Transformer的抓取模型TARGO-Net,该模型在遮挡增强时表现出最优的鲁棒性。本基准数据集可通过https://TARGO-benchmark.github.io/获取。