Multi-agent reinforcement learning (MARL) faces two critical bottlenecks distinct from single-agent RL: credit assignment in cooperative tasks and partial observability of environmental states. We propose LERO, a framework integrating Large language models (LLMs) with evolutionary optimization to address these MARL-specific challenges. The solution centers on two LLM-generated components: a hybrid reward function that dynamically allocates individual credit through reward decomposition, and an observation enhancement function that augments partial observations with inferred environmental context. An evolutionary algorithm optimizes these components through iterative MARL training cycles, where top-performing candidates guide subsequent LLM generations. Evaluations in Multi-Agent Particle Environments (MPE) demonstrate LERO's superiority over baseline methods, with improved task performance and training efficiency.
翻译:多智能体强化学习(MARL)面临两个不同于单智能体强化学习的关键瓶颈:协作任务中的信用分配问题与环境状态的部分可观测性。我们提出LERO框架,通过将大语言模型(LLMs)与进化优化相结合,以应对这些MARL特有的挑战。该解决方案的核心在于两个由LLM生成的组件:一是通过奖励分解动态分配个体信用的混合奖励函数,二是通过推断环境上下文增强部分观测的观测增强函数。进化算法通过迭代的MARL训练周期优化这些组件,其中性能最优的候选方案将指导后续的LLM生成。在Multi-Agent Particle Environments(MPE)中的评估表明,LERO在任务性能与训练效率方面均优于基线方法。