Task-oriented grasping (TOG) refers to the problem of predicting grasps on an object that enable subsequent manipulation tasks. To model the complex relationships between objects, tasks, and grasps, existing methods incorporate semantic knowledge as priors into TOG pipelines. However, the existing semantic knowledge is typically constructed based on closed-world concept sets, restraining the generalization to novel concepts out of the pre-defined sets. To address this issue, we propose GraspGPT, a large language model (LLM) based TOG framework that leverages the open-end semantic knowledge from an LLM to achieve zero-shot generalization to novel concepts. We conduct experiments on Language Augmented TaskGrasp (LA-TaskGrasp) dataset and demonstrate that GraspGPT outperforms existing TOG methods on different held-out settings when generalizing to novel concepts out of the training set. The effectiveness of GraspGPT is further validated in real-robot experiments. Our code, data, appendix, and video are publicly available at https://sites.google.com/view/graspgpt/.
翻译:摘要:任务导向抓取(TOG)是指预测物体上能够支持后续操作任务的抓取姿态的问题。为建模物体、任务和抓取之间的复杂关系,现有方法将语义知识作为先验信息融入TOG流程中。然而,现有的语义知识通常基于封闭世界概念集构建,这限制了其对预定义集合之外新概念的泛化能力。为解决这一问题,我们提出了GraspGPT——一种基于大语言模型(LLM)的TOG框架,通过利用LLM的开放式语义知识,实现对未见概念零样本泛化。我们在Language Augmented TaskGrasp(LA-TaskGrasp)数据集上开展实验,结果表明:在面向训练集外新概念的多种留出场景下,GraspGPT的泛化性能均优于现有TOG方法。进一步通过真实机器人实验验证了GraspGPT的有效性。我们的代码、数据、附录及演示视频已开源发布于https://sites.google.com/view/graspgpt/。