Gesture-Informed Robot Assistance via Foundation Models

Gestures serve as a fundamental and significant mode of non-verbal communication among humans. Deictic gestures (such as pointing towards an object), in particular, offer valuable means of efficiently expressing intent in situations where language is inaccessible, restricted, or highly specialized. As a result, it is essential for robots to comprehend gestures in order to infer human intentions and establish more effective coordination with them. Prior work often rely on a rigid hand-coded library of gestures along with their meanings. However, interpretation of gestures is often context-dependent, requiring more flexibility and common-sense reasoning. In this work, we propose a framework, GIRAF, for more flexibly interpreting gesture and language instructions by leveraging the power of large language models. Our framework is able to accurately infer human intent and contextualize the meaning of their gestures for more effective human-robot collaboration. We instantiate the framework for interpreting deictic gestures in table-top manipulation tasks and demonstrate that it is both effective and preferred by users, achieving 70% higher success rates than the baseline. We further demonstrate GIRAF's ability on reasoning about diverse types of gestures by curating a GestureInstruct dataset consisting of 36 different task scenarios. GIRAF achieved 81% success rate on finding the correct plan for tasks in GestureInstruct. Website: https://tinyurl.com/giraf23

翻译：姿态是人类非语言沟通中一种基础且重要的方式。指示性姿态（如指向物体）尤其在语言不可用、受限或高度专业化的情况下，提供了高效表达意图的宝贵手段。因此，机器人必须具备理解姿态的能力，以推断人类意图并建立更有效的协作。先前的研究通常依赖一套固定的手工编码姿态库及其对应含义。然而，姿态的解读通常依赖于上下文，需要更高的灵活性和常识推理能力。在本研究中，我们提出一个名为GIRAF的框架，通过利用大型语言模型的能力，更灵活地解读姿态和语言指令。该框架能够准确推断人类意图，并将姿态含义置于上下文中，以实现更高效的人机协作。我们实例化了该框架，用于解读桌面操作任务中的指示性姿态，并证明了其有效性及用户偏好性，成功率为基线的70%。此外，通过构建包含36种不同任务场景的GestureInstruct数据集，我们进一步展示了GIRAF对多种类型姿态的推理能力。GIRAF在GestureInstruct中为任务找到正确方案的成功率达到81%。网站：https://tinyurl.com/giraf23

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日