Although Deep Reinforcement Learning (DRL) has achieved notable success in numerous robotic applications, designing a high-performing reward function remains a challenging task that often requires substantial manual input. Recently, Large Language Models (LLMs) have been extensively adopted to address tasks demanding in-depth common-sense knowledge, such as reasoning and planning. Recognizing that reward function design is also inherently linked to such knowledge, LLM offers a promising potential in this context. Motivated by this, we propose in this work a novel LLM framework with a self-refinement mechanism for automated reward function design. The framework commences with the LLM formulating an initial reward function based on natural language inputs. Then, the performance of the reward function is assessed, and the results are presented back to the LLM for guiding its self-refinement process. We examine the performance of our proposed framework through a variety of continuous robotic control tasks across three diverse robotic systems. The results indicate that our LLM-designed reward functions are able to rival or even surpass manually designed reward functions, highlighting the efficacy and applicability of our approach.
翻译:尽管深度强化学习(DRL)已在众多机器人应用中取得显著成功,但设计高性能奖励函数仍是需要大量人工投入的挑战性任务。近年来,大语言模型(LLM)被广泛用于处理需要深度常识知识的任务(如推理与规划)。鉴于奖励函数设计与这类知识具有内在关联,LLM在此领域展现出巨大潜力。受此启发,本文提出一种具有自我精炼机制的新型LLM框架,用于自动化奖励函数设计。该框架首先基于自然语言输入由LLM生成初始奖励函数,随后评估奖励函数的性能,并将结果反馈给LLM以引导其自我精炼过程。我们通过三个不同机器人系统的连续控制任务验证了所提框架的性能。结果表明,我们的LLM设计奖励函数能够媲美甚至超越人工设计的奖励函数,凸显了该方法的有效性与适用性。