We introduce networked communication to the mean-field game framework, in particular to oracle-free settings where $N$ decentralised agents learn along a single, non-episodic run of the empirical system. We prove that our architecture, with only a few reasonable assumptions about network structure, has sample guarantees bounded between those of the centralised- and independent-learning cases. We discuss how the sample guarantees of the three theoretical algorithms do not actually result in practical convergence. We therefore show that in practical settings where the theoretical parameters are not observed (leading to poor estimation of the Q-function), our communication scheme significantly accelerates convergence over the independent case (and often even the centralised case), without relying on the assumption of a centralised learner. We contribute further practical enhancements to all three theoretical algorithms, allowing us to present their first empirical demonstrations. Our experiments confirm that we can remove several of the theoretical assumptions of the algorithms, and display the empirical convergence benefits brought by our new networked communication. We additionally show that the networked approach has significant advantages, over both the centralised and independent alternatives, in terms of robustness to unexpected learning failures and to changes in population size.
翻译:本文将网络化通信引入平均场博弈框架,特别针对无预言机场景——即$N$个去中心化智能体沿着经验系统的单次非片段化运行进行学习。我们证明,在仅对网络结构提出若干合理假设的前提下,该架构的样本保证界于集中式学习与独立学习情形之间。我们通过理论分析指出,三种理论算法的样本保证实际上并不能确保实践中的收敛性。因此,在无法观测理论参数(导致Q函数估计偏差)的实际场景中,我们的通信方案相较于独立学习情形(甚至常优于集中式情形)能显著加速收敛,且无需依赖集中式学习器的假设。我们进一步为三种理论算法提出了实践性改进方案,从而首次实现了它们的实证演示。实验结果表明:我们可以移除算法的多项理论假设,并验证了新型网络化通信带来的实证收敛优势。此外,研究还表明网络化方法在应对意外学习故障与种群规模变化方面,相较于集中式与独立式方案具有显著的鲁棒性优势。