In complex multi-agent environments, achieving efficient learning and desirable behaviours is a significant challenge for Multi-Agent Reinforcement Learning (MARL) systems. This work explores the potential of combining MARL with Large Language Model (LLM)-mediated interventions to guide agents toward more desirable behaviours. Specifically, we investigate how LLMs can be used to interpret and facilitate interventions that shape the learning trajectories of multiple agents. We experimented with two types of interventions, referred to as controllers: a Natural Language (NL) Controller and a Rule-Based (RB) Controller. The RB Controller showed a stronger impact than the NL Controller, which uses a small (7B/8B) LLM to simulate human-like interventions. Our findings indicate that agents particularly benefit from early interventions, leading to more efficient training and higher performance. Both intervention types outperform the baseline without interventions, highlighting the potential of LLM-mediated guidance to accelerate training and enhance MARL performance in challenging environments.
翻译:在复杂多智能体环境中,实现高效学习与期望行为是多智能体强化学习系统面临的重大挑战。本研究探索了将多智能体强化学习与大型语言模型介导的干预机制相结合,以引导智能体朝向更理想行为发展的潜力。具体而言,我们研究了如何利用大型语言模型来解析并实施干预措施,从而塑造多智能体的学习轨迹。我们实验了两种干预机制(称为控制器):自然语言控制器与基于规则的控制器。实验表明,基于规则的控制器比采用小型(7B/8B参数)大型语言模型模拟类人干预的自然语言控制器具有更强的影响力。研究结果表明,智能体在训练早期阶段接受干预尤其受益,能够实现更高效的训练过程和更优的性能表现。两种干预机制均显著优于无干预的基线系统,这凸显了大型语言模型介导的引导机制在加速训练进程、提升多智能体强化学习系统在复杂环境中的性能方面具有重要潜力。