This paper combines two contributions. First, we introduce an extension of the Meta-World benchmark, which we call "Language-World," which allows a large language model to operate in a simulated robotic environment using semi-structured natural language queries and scripted skills described using natural language. By using the same set of tasks as Meta-World, Language-World results can be easily compared to Meta-World results, allowing for a point of comparison between recent methods using Large Language Models (LLMs) and those using Deep Reinforcement Learning. Second, we introduce a method we call Plan Conditioned Behavioral Cloning (PCBC), that allows finetuning the behavior of high-level plans using end-to-end demonstrations. Using Language-World, we show that PCBC is able to achieve strong performance in a variety of few-shot regimes, often achieving task generalization with as little as a single demonstration. We have made Language-World available as open-source software at https://github.com/krzentner/language-world/.
翻译:本文包含两项贡献。首先,我们提出对Meta-World基准的扩展,称之为"Language-World",该方法允许大语言模型通过半结构化自然语言查询及自然语言描述的脚本化技能在模拟机器人环境中运行。由于采用与Meta-World相同的任务集,Language-World的结果可与Meta-World结果进行直接比较,从而为近期使用大语言模型的方法与基于深度强化学习的方法提供对比基准。其次,我们提出一种名为"计划条件行为克隆"(Plan Conditioned Behavioral Cloning, PCBC)的方法,该方法允许通过端到端演示对高层计划的执行行为进行微调。借助Language-World,我们证明PCBC能在多种少样本场景中实现卓越性能,通常仅需单次演示即可完成任务泛化。我们已将Language-World作为开源软件发布于https://github.com/krzentner/language-world/。