Code as Policies: Language Model Programs for Embodied Control

Large language models (LLMs) trained on code completion have been shown to be capable of synthesizing simple Python programs from docstrings [1]. We find that these code-writing LLMs can be re-purposed to write robot policy code, given natural language commands. Specifically, policy code can express functions or feedback loops that process perception outputs (e.g.,from object detectors [2], [3]) and parameterize control primitive APIs. When provided as input several example language commands (formatted as comments) followed by corresponding policy code (via few-shot prompting), LLMs can take in new commands and autonomously re-compose API calls to generate new policy code respectively. By chaining classic logic structures and referencing third-party libraries (e.g., NumPy, Shapely) to perform arithmetic, LLMs used in this way can write robot policies that (i) exhibit spatial-geometric reasoning, (ii) generalize to new instructions, and (iii) prescribe precise values (e.g., velocities) to ambiguous descriptions ("faster") depending on context (i.e., behavioral commonsense). This paper presents code as policies: a robot-centric formulation of language model generated programs (LMPs) that can represent reactive policies (e.g., impedance controllers), as well as waypoint-based policies (vision-based pick and place, trajectory-based control), demonstrated across multiple real robot platforms. Central to our approach is prompting hierarchical code-gen (recursively defining undefined functions), which can write more complex code and also improves state-of-the-art to solve 39.8% of problems on the HumanEval [1] benchmark. Code and videos are available at https://code-as-policies.github.io

翻译：大型语言模型（LLMs）在代码补全任务上的训练已被证明能够根据文档字符串合成简单的Python程序[1]。我们发现，这些编写代码的LLMs可被重新用于根据自然语言指令编写机器人策略代码。具体而言，策略代码可以表达处理感知输出（例如来自目标检测器[2],[3]）并参数化控制原始API的函数或反馈回路。当输入若干示例语言指令（以注释形式格式化）及对应的策略代码（通过少样本提示）时，LLMs能够接收新指令并自主重新组合API调用来生成相应的新策略代码。通过链接经典逻辑结构并引用第三方库（如NumPy、Shapely）执行算术运算，以这种方式使用的LLMs能够编写出具有以下特性的机器人策略：（i）展现空间几何推理能力，（ii）泛化至新指令，以及（iii）根据上下文（即行为常识）为模糊描述（如“更快”）指定精确值（如速度）。本文提出“代码即策略”：一种面向机器人的语言模型生成程序（LMPs）公式化方法，可表示反应式策略（如阻抗控制器）及基于航点的策略（基于视觉的拾取与放置、基于轨迹的控制），并在多个真实机器人平台上进行了演示。我们方法的核心是分层代码生成提示（递归定义未定义函数），该方法不仅能编写更复杂的代码，还提升了在HumanEval[1]基准测试中最先进的结果，解决了39.8%的问题。代码和视频请访问 https://code-as-policies.github.io。