Language to Rewards for Robotic Skill Synthesis

Wenhao Yu,Nimrod Gileadi,Chuyuan Fu,Sean Kirmani,Kuang-Huei Lee,Montse Gonzalez Arenas,Hao-Tien Lewis Chiang,Tom Erez,Leonard Hasenclever,Jan Humplik,Brian Ichter,Ted Xiao,Peng Xu,Andy Zeng,Tingnan Zhang,Nicolas Heess,Dorsa Sadigh,Jie Tan,Yuval Tassa,Fei Xia

from arxiv, https://language-to-reward.github.io/

Large language models (LLMs) have demonstrated exciting progress in acquiring diverse new capabilities through in-context learning, ranging from logical reasoning to code-writing. Robotics researchers have also explored using LLMs to advance the capabilities of robotic control. However, since low-level robot actions are hardware-dependent and underrepresented in LLM training corpora, existing efforts in applying LLMs to robotics have largely treated LLMs as semantic planners or relied on human-engineered control primitives to interface with the robot. On the other hand, reward functions are shown to be flexible representations that can be optimized for control policies to achieve diverse tasks, while their semantic richness makes them suitable to be specified by LLMs. In this work, we introduce a new paradigm that harnesses this realization by utilizing LLMs to define reward parameters that can be optimized and accomplish variety of robotic tasks. Using reward as the intermediate interface generated by LLMs, we can effectively bridge the gap between high-level language instructions or corrections to low-level robot actions. Meanwhile, combining this with a real-time optimizer, MuJoCo MPC, empowers an interactive behavior creation experience where users can immediately observe the results and provide feedback to the system. To systematically evaluate the performance of our proposed method, we designed a total of 17 tasks for a simulated quadruped robot and a dexterous manipulator robot. We demonstrate that our proposed method reliably tackles 90% of the designed tasks, while a baseline using primitive skills as the interface with Code-as-policies achieves 50% of the tasks. We further validated our method on a real robot arm where complex manipulation skills such as non-prehensile pushing emerge through our interactive system.

翻译：大型语言模型（LLMs）通过上下文学习在获取多样化新能力方面展现出令人振奋的进展，涵盖逻辑推理到代码编写等领域。机器人研究领域也已探索利用LLMs提升机器人控制能力。然而，由于底层机器人动作依赖于硬件且在LLM训练语料中代表性不足，现有将LLMs应用于机器人领域的研究多将其视为语义规划器，或依赖人工设计的控制原语与机器人交互。另一方面，奖励函数被证明是一种灵活的表征形式，可通过优化控制策略完成多样化任务，而其语义丰富性使其适合由LLMs指定。本文提出一种新范式，通过利用LLMs定义可优化的奖励参数，实现多种机器人任务的完成。以LLMs生成的奖励作为中间接口，可有效弥合高级语言指令或修正与低级机器人动作之间的鸿沟。同时，结合实时优化器MuJoCo MPC，该系统支持交互式行为创建体验，使用户能即时观察结果并提供反馈。为系统评估所提方法性能，我们针对模拟四足机器人和灵巧操作机器人共设计了17个任务。实验表明，所提方法可靠完成了90%的设计任务，而采用原语技能作为接口配合Code-as-policies的基线方法仅完成50%任务。我们进一步在真实机械臂上验证了该方法，通过交互式系统涌现出非抓取式推挤等复杂操作技能。