Goal-conditioned and Multi-Task Reinforcement Learning (GCRL and MTRL) address numerous problems related to robot learning, including locomotion, navigation, and manipulation scenarios. Recent works focusing on language-defined robotic manipulation tasks have led to the tedious production of massive human annotations to create dataset of textual descriptions associated with trajectories. To leverage reinforcement learning with text-based task descriptions, we need to produce reward functions associated with individual tasks in a scalable manner. In this paper, we leverage recent capabilities of Large Language Models (LLMs) and introduce \larg, Language-based Automatic Reward and Goal Generation, an approach that converts a text-based task description into its corresponding reward and goal-generation functions We evaluate our approach for robotic manipulation and demonstrate its ability to train and execute policies in a scalable manner, without the need for handcrafted reward functions.
翻译:目标条件强化学习和多任务强化学习(GCRL与MTRL)解决了机器人学习中的诸多问题,包括移动、导航和操作场景。近期聚焦于语言定义的机器人操作任务的研究,催生了大量人工标注,以创建与轨迹关联的文本描述数据集。为了利用基于文本任务描述的强化学习,我们需以可扩展的方式生成与各个任务关联的奖励函数。本文借助大语言模型(LLMs)的最新能力,提出了LARG(基于语言的自动奖励与目标生成)方法,该方法将基于文本的任务描述转化为对应的奖励函数和目标生成函数。我们在机器人操作任务上评估了该方法,证明了其无需手工设计奖励函数即可实现可扩展的策略训练与执行能力。