Robotic grasping is a fundamental ability for a robot to interact with the environment. Current methods focus on how to obtain a stable and reliable grasping pose in object wise, while little work has been studied on part (shape)-wise grasping which is related to fine-grained grasping and robotic affordance. Parts can be seen as atomic elements to compose an object, which contains rich semantic knowledge and a strong correlation with affordance. However, lacking a large part-wise 3D robotic dataset limits the development of part representation learning and downstream application. In this paper, we propose a new large Language-guided SHape grAsPing datasEt (named Lang-SHAPE) to learn 3D part-wise affordance and grasping ability. We design a novel two-stage fine-grained robotic grasping network (named PIONEER), including a novel 3D part language grounding model, and a part-aware grasp pose detection model. To evaluate the effectiveness, we perform multi-level difficulty part language grounding grasping experiments and deploy our proposed model on a real robot. Results show our method achieves satisfactory performance and efficiency in reference identification, affordance inference, and 3D part-aware grasping. Our dataset and code are available on our project website https://sites.google.com/view/lang-shape
翻译:机器人抓取是机器人与环境交互的基本能力。当前方法主要关注如何在物体层面上获得稳定可靠的抓取姿态,但针对与细粒度抓取和机器人可供性相关的部件(形状)层面抓取的研究尚不充分。部件可视为构成物体的原子元素,包含丰富的语义知识并与可供性具有强相关性。然而,缺乏大规模部件级3D机器人数据集限制了部件表示学习及其下游应用的发展。本文提出一种新型大规模语言引导形状抓取数据集(命名为Lang-SHAPE),用于学习3D部件级可供性与抓取能力。我们设计了一种新颖的两阶段细粒度机器人抓取网络(命名为PIONEER),包含新型3D部件语言定位模型和部件感知抓取姿态检测模型。为评估有效性,我们开展了多难度级别的部件语言定位抓取实验,并在真实机器人上部署了所提模型。结果表明,本方法在参考识别、可供性推理和3D部件感知抓取方面均取得了令人满意的性能与效率。我们的数据集和代码已发布于项目网站https://sites.google.com/view/lang-shape