Although Deep Reinforcement Learning (DRL) has achieved notable success in numerous robotic applications, designing a high-performing reward function remains a challenging task that often requires substantial manual input. Recently, Large Language Models (LLMs) have been extensively adopted to address tasks demanding in-depth common-sense knowledge, such as reasoning and planning. Recognizing that reward function design is also inherently linked to such knowledge, LLM offers a promising potential in this context. Motivated by this, we propose in this work a novel LLM framework with a self-refinement mechanism for automated reward function design. The framework commences with the LLM formulating an initial reward function based on natural language inputs. Then, the performance of the reward function is assessed, and the results are presented back to the LLM for guiding its self-refinement process. We examine the performance of our proposed framework through a variety of continuous robotic control tasks across three diverse robotic systems. The results indicate that our LLM-designed reward functions are able to rival or even surpass manually designed reward functions, highlighting the efficacy and applicability of our approach.
翻译:尽管深度强化学习在众多机器人应用中取得了显著成功,但设计高性能奖励函数仍是一项具有挑战性的任务,往往需要大量人工投入。近年来,大语言模型被广泛用于处理需要深度常识知识的任务,如推理与规划。鉴于奖励函数设计本质上也与此类知识相关,大语言模型在此背景下展现出巨大潜力。受此启发,本文提出一种具备自精炼机制的大语言模型框架,用于自动化奖励函数设计。该框架首先由大语言模型基于自然语言输入生成初始奖励函数,随后评估该奖励函数的性能并将结果反馈至大语言模型,以引导其自精炼过程。我们通过三个不同机器人系统中的连续控制任务验证了所提框架的性能。结果表明,由大语言模型设计的奖励函数能够媲美甚至超越人工设计的奖励函数,凸显了该方法的有效性与适用性。