独立多智能体强化学习中涌现的协调行为与相结构 (Emergent Coordination and Phase Structure in Independent Multi-Agent Reinforcement Learning)

A clearer understanding of when coordination emerges, fluctuates, or collapses in decentralized multi-agent reinforcement learning (MARL) is increasingly sought in order to characterize the dynamics of multi-agent learning systems. We revisit fully independent Q-learning (IQL) as a minimal decentralized testbed and run large-scale experiments across environment size L and agent density rho. We construct a phase map using two axes - the cooperative success rate (CSR) and a stability index derived from TD-error variance - revealing three distinct regimes: a coordinated and stable phase, a fragile transition region, and a jammed or disordered phase. A sharp double Instability Ridge separates these regimes and corresponds to persistent kernel drift, the time-varying shift of each agent's effective transition kernel induced by others' policy updates. Synchronization analysis further shows that temporal alignment is required for sustained cooperation, and that competition between drift and synchronization generates the fragile regime. Removing agent identifiers eliminates drift entirely and collapses the three-phase structure, demonstrating that small inter-agent asymmetries are a necessary driver of drift. Overall, the results show that decentralized MARL exhibits a coherent phase structure governed by the interaction between scale, density, and kernel drift, suggesting that emergent coordination behaves as a distribution-interaction-driven phase phenomenon.

翻译：为刻画多智能体学习系统的动力学特性，学界日益关注在去中心化多智能体强化学习（MARL）中协调行为何时涌现、波动或崩溃的清晰理解。本研究以完全独立Q学习（IQL）作为最小化去中心化测试平台，在环境尺度L与智能体密度ρ参数空间开展大规模实验。通过构建以合作成功率（CSR）和基于TD误差方差导出的稳定性指数为坐标轴的相图，揭示了三个显著区域：协调稳定相、脆弱过渡区及阻塞/无序相。尖锐的双重不稳定脊线分隔了这些区域，其对应持续的内核漂移现象——即各智能体因其他智能体策略更新而产生的时变有效转移核偏移。同步化分析进一步表明：持续合作需要时间对齐机制，而漂移与同步化之间的竞争催生了脆弱过渡区。消除智能体标识符可完全消解漂移现象并使三相结构坍缩，证明微小的智能体间不对称性是驱动漂移的必要条件。总体而言，研究结果表明去中心化MARL具有受尺度、密度与内核漂移相互作用支配的连贯相结构，暗示涌现的协调行为可视为一种分布-交互驱动的相变现象。