Language instructions and demonstrations are two natural ways for users to teach robots personalized tasks. Recent progress in Large Language Models (LLMs) has shown impressive performance in translating language instructions into code for robotic tasks. However, translating demonstrations into task code continues to be a challenge due to the length and complexity of both demonstrations and code, making learning a direct mapping intractable. This paper presents Demo2Code, a novel framework that generates robot task code from demonstrations via an extended chain-of-thought and defines a common latent specification to connect the two. Our framework employs a robust two-stage process: (1) a recursive summarization technique that condenses demonstrations into concise specifications, and (2) a code synthesis approach that expands each function recursively from the generated specifications. We conduct extensive evaluation on various robot task benchmarks, including a novel game benchmark Robotouille, designed to simulate diverse cooking tasks in a kitchen environment.
翻译:语言指令和演示是用户教机器人执行个性化任务的两种自然方式。大型语言模型(LLMs)的最新进展在将语言指令转化为机器人任务代码方面展现出卓越性能。然而,由于演示和代码的长度与复杂性,将演示转化为任务代码仍是一项挑战,这使得学习直接映射变得困难。本文提出Demo2Code,一种通过扩展思维链从演示生成机器人任务代码的新框架,并定义了连接两者的通用隐式规范。该框架采用稳健的两阶段流程:(1) 递归摘要技术,将演示压缩为简洁规范;(2) 代码合成方法,从生成的规范中递归扩展每个函数。我们在多种机器人任务基准上进行了全面评估,包括一个新颖的游戏基准Robotouille,该基准旨在模拟厨房环境中的多样化烹饪任务。