Designing effective reward functions remains a major challenge in reinforcement learning (RL), often requiring considerable human expertise and iterative refinement. Recent advances leverage Large Language Models (LLMs) for automated reward design, but these approaches are limited by hallucinations, reliance on human feedback, and challenges with handling complex, multi-step tasks. In this work, we introduce Reward Evolution with Graph-of-Thoughts (RE-GoT), a novel bi-level framework that enhances LLMs with structured graph-based reasoning and integrates Visual Language Models (VLMs) for automated rollout evaluation. RE-GoT first decomposes tasks into text-attributed graphs, enabling comprehensive analysis and reward function generation, and then iteratively refines rewards using visual feedback from VLMs without human intervention. Extensive experiments on 10 RoboGen and 4 ManiSkill2 tasks demonstrate that RE-GoT consistently outperforms existing LLM-based baselines. On RoboGen, our method improves average task success rates by 32.25%, with notable gains on complex multi-step tasks. On ManiSkill2, RE-GoT achieves an average success rate of 93.73% across four diverse manipulation tasks, significantly surpassing prior LLM-based approaches and even exceeding expert-designed rewards. Our results indicate that combining LLMs and VLMs with graph-of-thoughts reasoning provides a scalable and effective solution for autonomous reward evolution in RL.
翻译:设计有效的奖励函数仍然是强化学习(RL)中的一个主要挑战,通常需要大量的人类专业知识和迭代优化。近期研究利用大型语言模型(LLMs)实现自动化奖励设计,但这些方法受限于幻觉问题、对人类反馈的依赖以及处理复杂多步骤任务时的挑战。本文提出基于思维图谱的奖励演化(RE-GoT),这是一种新颖的双层框架,通过基于图的结构化推理增强LLMs,并集成视觉语言模型(VLMs)以实现自动化轨迹评估。RE-GoT首先将任务分解为文本属性图,从而进行综合分析并生成奖励函数,随后利用VLMs提供的视觉反馈在无需人工干预的情况下迭代优化奖励函数。在10个RoboGen任务和4个ManiSkill2任务上进行的大量实验表明,RE-GoT始终优于现有的基于LLM的基线方法。在RoboGen上,我们的方法将平均任务成功率提高了32.25%,在复杂的多步骤任务上提升尤为显著。在ManiSkill2上,RE-GoT在四个不同的操作任务中实现了93.73%的平均成功率,显著超越了先前基于LLM的方法,甚至超过了专家设计的奖励函数。我们的结果表明,将LLMs和VLMs与思维图谱推理相结合,为强化学习中的自主奖励演化提供了一个可扩展且有效的解决方案。