We introduce networked communication to mean-field control (MFC) - the cooperative counterpart to mean-field games (MFGs) - and in particular to the setting where decentralised agents learn online from a single, non-episodic run of the empirical system. We adapt recent algorithms for MFGs to this new setting, as well as contributing a novel sub-routine allowing networked agents to estimate the global average reward from their local neighbourhood. We show that the networked communication scheme allows agents to increase social welfare faster than under both the centralised and independent architectures, by computing a population of potential updates in parallel and then propagating the highest-performing ones through the population, via a method that can also be seen as tackling the credit-assignment problem. We prove this new result theoretically and provide experiments that support it across numerous games, as well as exploring the empirical finding that smaller communication radii can benefit convergence in a specific class of game while still outperforming agents learning entirely independently. We provide numerous ablation studies and additional experiments on numbers of communication round and robustness to communication failures.
翻译:本文将网络通信引入均值场控制——作为均值场博弈的协作对应框架——特别关注去中心化智能体从经验系统的单次非片段化在线学习场景。我们针对该新场景改进了近期均值场博弈算法,并提出一种创新子程序,使网络化智能体能够通过局部邻域信息估计全局平均奖励。研究表明,网络通信架构通过并行计算潜在更新种群并传播最优解至整个群体(该方法亦可视为信用分配问题的解决方案),使得智能体在提升社会福利方面优于集中式与独立式架构。我们通过理论证明该新结论,并在多类博弈实验中验证其有效性,同时发现较小通信半径在特定博弈类别中能促进收敛且仍优于完全独立学习的智能体。我们提供了关于通信轮次数量与通信故障鲁棒性的多项消融实验与补充研究。