Vision-Language Interpreter for Robot Task Planning

Keisuke Shirai,Cristian C. Beltran-Hernandez,Masashi Hamaya,Atsushi Hashimoto,Shohei Tanaka,Kento Kawaharazuka,Kazutoshi Tanaka,Yoshitaka Ushiku,Shinsuke Mori

Large language models (LLMs) are accelerating the development of language-guided robot planners. Meanwhile, symbolic planners offer the advantage of interpretability. This paper proposes a new task that bridges these two trends, namely, multimodal planning problem specification. The aim is to generate a problem description (PD), a machine-readable file used by the planners to find a plan. By generating PDs from language instruction and scene observation, we can drive symbolic planners in a language-guided framework. We propose a Vision-Language Interpreter (ViLaIn), a new framework that generates PDs using state-of-the-art LLM and vision-language models. ViLaIn can refine generated PDs via error message feedback from the symbolic planner. Our aim is to answer the question: How accurately can ViLaIn and the symbolic planner generate valid robot plans? To evaluate ViLaIn, we introduce a novel dataset called the problem description generation (ProDG) dataset. The framework is evaluated with four new evaluation metrics. Experimental results show that ViLaIn can generate syntactically correct problems with more than 99% accuracy and valid plans with more than 58% accuracy.

翻译：大语言模型（LLMs）正加速语言引导型机器人规划器的发展。与此同时，符号规划器具有可解释性的优势。本文提出一项连接这两种趋势的新任务，即多模态规划问题规范。其目标是生成问题描述（PD）——一种供规划器用于寻找方案的机器可读文件。通过从语言指令和场景观测中生成PD，我们能够在语言引导框架中驱动符号规划器。我们提出视觉-语言解释器（ViLaIn），这是一种利用最先进的LLM和视觉-语言模型生成PD的新框架。ViLaIn能够通过符号规划器返回的错误信息反馈来改进生成的PD。我们的目标是回答以下问题：ViLaIn与符号规划器生成有效机器人规划方案的准确率有多高？为评估ViLaIn，我们引入名为问题描述生成（ProDG）数据集的新型数据集。该框架采用四项新评估指标进行评价。实验结果表明，ViLaIn生成语法正确问题的准确率超过99%，生成有效规划方案的准确率超过58%。

相关内容

AIM

关注 660

医学人工智能AIM（Artificial Intelligence in Medicine）杂志发表了多学科领域的原创文章，涉及医学中的人工智能理论和实践，以医学为导向的人类生物学和卫生保健。医学中的人工智能可以被描述为与研究、项目和应用相关的科学学科，旨在通过基于知识或数据密集型的计算机解决方案支持基于决策的医疗任务，最终支持和改善人类护理提供者的性能。官网地址：http://dblp.uni-trier.de/db/journals/artmed/

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日