Language to Rewards for Robotic Skill Synthesis

Wenhao Yu,Nimrod Gileadi,Chuyuan Fu,Sean Kirmani,Kuang-Huei Lee,Montse Gonzalez Arenas,Hao-Tien Lewis Chiang,Tom Erez,Leonard Hasenclever,Jan Humplik,Brian Ichter,Ted Xiao,Peng Xu,Andy Zeng,Tingnan Zhang,Nicolas Heess,Dorsa Sadigh,Jie Tan,Yuval Tassa,Fei Xia

from arxiv, https://language-to-reward.github.io/

Large language models (LLMs) have demonstrated exciting progress in acquiring diverse new capabilities through in-context learning, ranging from logical reasoning to code-writing. Robotics researchers have also explored using LLMs to advance the capabilities of robotic control. However, since low-level robot actions are hardware-dependent and underrepresented in LLM training corpora, existing efforts in applying LLMs to robotics have largely treated LLMs as semantic planners or relied on human-engineered control primitives to interface with the robot. On the other hand, reward functions are shown to be flexible representations that can be optimized for control policies to achieve diverse tasks, while their semantic richness makes them suitable to be specified by LLMs. In this work, we introduce a new paradigm that harnesses this realization by utilizing LLMs to define reward parameters that can be optimized and accomplish variety of robotic tasks. Using reward as the intermediate interface generated by LLMs, we can effectively bridge the gap between high-level language instructions or corrections to low-level robot actions. Meanwhile, combining this with a real-time optimizer, MuJoCo MPC, empowers an interactive behavior creation experience where users can immediately observe the results and provide feedback to the system. To systematically evaluate the performance of our proposed method, we designed a total of 17 tasks for a simulated quadruped robot and a dexterous manipulator robot. We demonstrate that our proposed method reliably tackles 90% of the designed tasks, while a baseline using primitive skills as the interface with Code-as-policies achieves 50% of the tasks. We further validated our method on a real robot arm where complex manipulation skills such as non-prehensile pushing emerge through our interactive system.

翻译：大语言模型（LLMs）通过上下文学习在获取多样新能力方面取得了令人瞩目的进展，涵盖从逻辑推理到代码编写等多个领域。机器人研究者也开始探索利用LLMs提升机器人控制能力。然而，由于底层机器人动作具有硬件依赖性且未充分出现在LLM训练语料中，现有将LLMs应用于机器人领域的工作大多将LLMs视为语义规划器，或依赖人工设计的控制原语与机器人交互。另一方面，奖励函数被证明是灵活的表示形式，可通过优化控制策略实现多样化任务，同时其语义丰富性使其适合由LLMs指定。本文提出一种新范式，利用这一认知：通过LLMs定义可优化的奖励参数，完成各类机器人任务。以LLMs生成的奖励作为中间接口，可有效弥合高级语言指令或修正与底层机器人动作之间的鸿沟。同时，将该方法结合实时优化器MuJoCo MPC，可构建交互式行为生成体验，使用户能即时观察结果并提供反馈。为系统评估所提方法性能，我们针对模拟四足机器人和灵巧操作机器人设计了共计17项任务。实验表明，所提方法可靠完成了90%的设计任务，而使用原始技能作为接口的Code-as-policies基线方法仅完成50%任务。进一步在真实机器人臂上验证了该方法，通过交互式系统成功实现了非抓取式推挤等复杂操作技能。