Vision-Language Interpreter for Robot Task Planning

Keisuke Shirai,Cristian C. Beltran-Hernandez,Masashi Hamaya,Atsushi Hashimoto,Shohei Tanaka,Kento Kawaharazuka,Kazutoshi Tanaka,Yoshitaka Ushiku,Shinsuke Mori

from arxiv, ICRA 2024

Large language models (LLMs) are accelerating the development of language-guided robot planners. Meanwhile, symbolic planners offer the advantage of interpretability. This paper proposes a new task that bridges these two trends, namely, multimodal planning problem specification. The aim is to generate a problem description (PD), a machine-readable file used by the planners to find a plan. By generating PDs from language instruction and scene observation, we can drive symbolic planners in a language-guided framework. We propose a Vision-Language Interpreter (ViLaIn), a new framework that generates PDs using state-of-the-art LLM and vision-language models. ViLaIn can refine generated PDs via error message feedback from the symbolic planner. Our aim is to answer the question: How accurately can ViLaIn and the symbolic planner generate valid robot plans? To evaluate ViLaIn, we introduce a novel dataset called the problem description generation (ProDG) dataset. The framework is evaluated with four new evaluation metrics. Experimental results show that ViLaIn can generate syntactically correct problems with more than 99\% accuracy and valid plans with more than 58\% accuracy. Our code and dataset are available at https://github.com/omron-sinicx/ViLaIn.

翻译：大型语言模型（LLMs）正加速语言引导的机器人规划器的发展。与此同时，符号规划器具有可解释性的优势。本文提出了一项融合上述两种趋势的新任务，即多模态规划问题规范。其目标是生成问题描述（PD）——一种由规划器用于寻找方案的机器可读文件。通过从语言指令和场景观察中生成PD，我们能够在语言引导的框架下驱动符号规划器。我们提出了一种视觉语言解释器（ViLaIn），这是一个利用最先进的LLM和视觉语言模型生成PD的新框架。ViLaIn能够通过符号规划器返回的错误信息反馈来优化生成的PD。我们的目标是回答以下问题：ViLaIn与符号规划器生成有效机器人规划的准确度如何？为评估ViLaIn，我们引入了一个名为问题描述生成（ProDG）数据集的新数据集。该框架使用四项新的评估指标进行评测。实验结果表明，ViLaIn能够以超过99%的准确率生成语法正确的问题，并以超过58%的准确率生成有效规划。我们的代码和数据集可在https://github.com/omron-sinicx/ViLaIn获取。

相关内容

AIM

关注 660

医学人工智能AIM（Artificial Intelligence in Medicine）杂志发表了多学科领域的原创文章，涉及医学中的人工智能理论和实践，以医学为导向的人类生物学和卫生保健。医学中的人工智能可以被描述为与研究、项目和应用相关的科学学科，旨在通过基于知识或数据密集型的计算机解决方案支持基于决策的医疗任务，最终支持和改善人类护理提供者的性能。官网地址：http://dblp.uni-trier.de/db/journals/artmed/

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日