Designing effective reward functions remains a major challenge in reinforcement learning (RL), often requiring considerable human expertise and iterative refinement. Recent advances leverage Large Language Models (LLMs) for automated reward design, but these approaches are limited by hallucinations, reliance on human feedback, and challenges with handling complex, multi-step tasks. In this work, we introduce Reward Evolution with Graph-of-Thoughts (RE-GoT), a novel bi-level framework that enhances LLMs with structured graph-based reasoning and integrates Visual Language Models (VLMs) for automated rollout evaluation. RE-GoT first decomposes tasks into text-attributed graphs, enabling comprehensive analysis and reward function generation, and then iteratively refines rewards using visual feedback from VLMs without human intervention. Extensive experiments on 10 RoboGen and 4 ManiSkill2 tasks demonstrate that RE-GoT consistently outperforms existing LLM-based baselines. On RoboGen, our method improves average task success rates by 32.25%, with notable gains on complex multi-step tasks. On ManiSkill2, RE-GoT achieves an average success rate of 93.73% across four diverse manipulation tasks, significantly surpassing prior LLM-based approaches and even exceeding expert-designed rewards. Our results indicate that combining LLMs and VLMs with graph-of-thoughts reasoning provides a scalable and effective solution for autonomous reward evolution in RL.
翻译:设计有效的奖励函数仍是强化学习(RL)中的主要挑战,通常需要大量人类专业知识和迭代优化。近期进展利用大型语言模型(LLMs)实现自动化奖励设计,但这些方法受限于幻觉现象、对人工反馈的依赖,以及处理复杂多步任务时的困难。在本工作中,我们提出基于思维图的奖励进化(RE-GoT)——一种新型双层框架,通过结构化图推理增强LLMs能力,并整合视觉语言模型(VLMs)实现自动化展开评估。RE-GoT首先将任务分解为文本属性图,支持全面分析与奖励函数生成,继而利用VLMs的视觉反馈迭代优化奖励,全程无需人工干预。在10个RoboGen任务和4个ManiSkill2任务上的广泛实验表明,RE-GoT持续优于现有基于LLM的基线方法。在RoboGen上,本方法将平均任务成功率提升32.25%,尤其对复杂多步任务表现突出。在ManiSkill2上,RE-GoT在四项多样化操作任务中取得93.73%的平均成功率,显著超越先前基于LLM的方法,甚至超过专家设计的奖励函数。我们的结果表明,将LLMs与VLMs结合思维图推理,为强化学习中自主奖励进化提供了可扩展且有效的解决方案。