Multi-agent reinforcement learning (MARL) is crucial for AI systems that operate collaboratively in distributed and adversarial settings, particularly in multi-domain operations (MDO). A central challenge in cooperative MARL is determining how agents should coordinate: existing approaches must either hand-specify graph topology, rely on proximity-based heuristics, or learn structure entirely from environment interaction; all of which are brittle, semantically uninformed, or data-intensive. We investigate whether large language models (LLMs) can generate useful coordination graph priors for MARL by using minimal natural language descriptions of agent observations to infer latent coordination patterns. These priors are integrated into MARL algorithms via graph convolutional layers within a graph neural network (GNN)-based pipeline, and evaluated on four cooperative scenarios from the Multi-Agent Particle Environment (MPE) benchmark against baselines spanning the full spectrum of coordination modeling, from independent learners to state-of-the-art graph-based methods. We further ablate across five compact open-source LLMs to assess the sensitivity of prior quality to model choice. Our results provide the first quantitative evidence that LLM-derived graph priors can enhance coordination and adaptability in dynamic multi-agent environments, and demonstrate that models as small as 1.5B parameters are sufficient for effective prior generation.
翻译:多智能体强化学习对于在分布式和对抗性环境中协同运作的人工智能系统至关重要,特别是在多域作战场景中。协同多智能体强化学习的核心挑战在于确定智能体应如何协调:现有方法要么需要手动指定图拓扑结构,要么依赖基于距离的启发式规则,要么完全通过环境交互来学习结构——这些方法都存在鲁棒性差、缺乏语义信息或数据需求庞大的问题。本文探究是否可以利用大语言模型,通过智能体观测的最小自然语言描述来推断潜在协调模式,从而为多智能体强化学习生成有效的协调图先验。我们将这些先验通过图神经网络流水线中的图卷积层集成到多智能体强化学习算法中,并在多智能体粒子环境基准测试的四个协同场景中,与从独立学习到最先进的基于图的方法等涵盖完整协调建模谱系的基线方法进行对比评估。此外,我们对五种紧凑型开源大语言模型进行消融实验,以评估先验质量对模型选择的敏感性。研究结果首次提供定量证据,表明大语言模型衍生的图先验能够增强动态多智能体环境中的协调能力和适应性,并证明参数规模低至15亿的模型即可有效生成先验。