Interactive Object Grasping (IOG) is the task of identifying and grasping the desired object via human-robot natural language interaction. Current IOG systems assume that a human user initially specifies the target object's category (e.g., bottle). Inspired by pragmatics, where humans often convey their intentions by relying on context to achieve goals, we introduce a new IOG task, Pragmatic-IOG, and the corresponding dataset, Intention-oriented Multi-modal Dialogue (IM-Dial). In our proposed task scenario, an intention-oriented utterance (e.g., "I am thirsty") is initially given to the robot. The robot should then identify the target object by interacting with a human user. Based on the task setup, we propose a new robotic system that can interpret the user's intention and pick up the target object, Pragmatic Object Grasping (PROGrasp). PROGrasp performs Pragmatic-IOG by incorporating modules for visual grounding, question asking, object grasping, and most importantly, answer interpretation for pragmatic inference. Experimental results show that PROGrasp is effective in offline (i.e., target object discovery) and online (i.e., IOG with a physical robot arm) settings. Code and data are available at https://github.com/gicheonkang/prograsp.
翻译:交互式物体抓取(IOG)是通过人机自然语言交互识别并抓取目标物体的任务。现有IOG系统假设用户初始指定目标物体类别(如"瓶子")。受语用学启发——人类常通过依赖语境传递意图以实现目标——我们提出新任务Pragmatic-IOG及相应数据集IM-Dial(意图导向多模态对话)。在所提出的任务场景中,机器人首先接收意图导向语句(如"我口渴了"),随后需通过与用户交互识别目标物体。基于该任务设定,我们提出新型机器人系统PROGrasp(实用物体抓取),该系统可解读用户意图并抓取目标物体。PROGrasp通过集成视觉定位、提问生成、物体抓取模块,以及最关键的回答解读模块(用于语用推理),实现Pragmatic-IOG任务。实验结果表明,PROGrasp在离线(即目标物体发现)和在线(即使用实体机械臂进行IOG)场景中均表现有效。代码与数据可在https://github.com/gicheonkang/prograsg 获取。