Task-oriented grasping of unfamiliar objects is a necessary skill for robots in dynamic in-home environments. Inspired by the human capability to grasp such objects through intuition about their shape and structure, we present a novel zero-shot task-oriented grasping method leveraging a geometric decomposition of the target object into simple, convex shapes that we represent in a graph structure, including geometric attributes and spatial relationships. Our approach employs minimal essential information - the object's name and the intended task - to facilitate zero-shot task-oriented grasping. We utilize the commonsense reasoning capabilities of large language models to dynamically assign semantic meaning to each decomposed part and subsequently reason over the utility of each part for the intended task. Through extensive experiments on a real-world robotics platform, we demonstrate that our grasping approach's decomposition and reasoning pipeline is capable of selecting the correct part in 92% of the cases and successfully grasping the object in 82% of the tasks we evaluate. Additional videos, experiments, code, and data are available on our project website: https://shapegrasp.github.io/.
翻译:针对陌生物体的任务导向抓取是机器人在动态家庭环境中必备的技能。受人类通过直觉理解物体形状与结构实现抓取能力的启发,我们提出一种新颖的零样本任务导向抓取方法,该方法将目标物体几何分解为简单凸形状,并以图结构表示这些形状的几何属性与空间关系。我们的方法仅需最少必要信息——物体名称与预期任务——即可实现零样本任务导向抓取。我们利用大语言模型的常识推理能力,为每个分解部件动态赋予语义含义,进而推理各部件对预期任务的适用性。通过在真实机器人平台上的大量实验,我们证明该抓取方法的分解与推理管线在92%的案例中能正确选择抓取部件,并在82%的评估任务中成功抓取物体。附加视频、实验、代码及数据见项目网站:https://shapegrasp.github.io/。