Language instructions and demonstrations are two natural ways for users to teach robots personalized tasks. Recent progress in Large Language Models (LLMs) has shown impressive performance in translating language instructions into code for robotic tasks. However, translating demonstrations into task code continues to be a challenge due to the length and complexity of both demonstrations and code, making learning a direct mapping intractable. This paper presents Demo2Code, a novel framework that generates robot task code from demonstrations via an extended chain-of-thought and defines a common latent specification to connect the two. Our framework employs a robust two-stage process: (1) a recursive summarization technique that condenses demonstrations into concise specifications, and (2) a code synthesis approach that expands each function recursively from the generated specifications. We conduct extensive evaluation on various robot task benchmarks, including a novel game benchmark Robotouille, designed to simulate diverse cooking tasks in a kitchen environment. The project's website is available at https://portal-cornell.github.io/demo2code-webpage
翻译:语言指令和演示是用户教会机器人个性化任务的两种自然方式。大型语言模型(LLMs)的最新进展在将语言指令转化为机器人任务代码方面展现出令人印象深刻的能力。然而,由于演示和代码的长度与复杂性,将演示转化为任务代码仍然是一个挑战,这使得学习直接映射变得困难。本文提出了Demo2Code,这是一个新颖的框架,通过扩展的思维链从演示生成机器人任务代码,并定义了一个共同的潜在规范来连接两者。我们的框架采用了一个稳健的两阶段过程:(1)一种递归总结技术,将演示浓缩为简洁的规范,以及(2)一种代码合成方法,从生成的规范中递归地扩展每个函数。我们在各种机器人任务基准上进行了广泛评估,包括一个新颖的游戏基准Robotouille,该基准旨在模拟厨房环境中的多种烹饪任务。项目网站见:https://portal-cornell.github.io/demo2code-webpage