In this work, we attempt to bridge the two fields of finite-agent and infinite-agent games, by studying how the optimal policies of agents evolve with the number of agents (population size) in mean-field games, an agent-centric perspective in contrast to the existing works focusing typically on the convergence of the empirical distribution of the population. To this end, the premise is to obtain the optimal policies of a set of finite-agent games with different population sizes. However, either deriving the closed-form solution for each game is theoretically intractable, training a distinct policy for each game is computationally intensive, or directly applying the policy trained in a game to other games is sub-optimal. We address these challenges through the Population-size-Aware Policy Optimization (PAPO). Our contributions are three-fold. First, to efficiently generate efficient policies for games with different population sizes, we propose PAPO, which unifies two natural options (augmentation and hypernetwork) and achieves significantly better performance. PAPO consists of three components: i) the population-size encoding which transforms the original value of population size to an equivalent encoding to avoid training collapse, ii) a hypernetwork to generate a distinct policy for each game conditioned on the population size, and iii) the population size as an additional input to the generated policy. Next, we construct a multi-task-based training procedure to efficiently train the neural networks of PAPO by sampling data from multiple games with different population sizes. Finally, extensive experiments on multiple environments show the significant superiority of PAPO over baselines, and the analysis of the evolution of the generated policies further deepens our understanding of the two fields of finite-agent and infinite-agent games.
翻译:本研究尝试通过研究平均场博弈中智能体最优策略随智能体数量(种群规模)的演化规律,来架接有限智能体博弈与无限智能体博弈这两个领域——这一视角以智能体为中心,与现有侧重种群经验分布收敛性的工作形成对比。为此,前提条件是获取一组具有不同种群规模的有限智能体博弈的最优策略。然而,针对每个博弈推导闭式解在理论上难以实现,为每个博弈训练独立策略计算成本高昂,而将某个博弈中训练的策略直接应用于其他博弈则会产生次优结果。我们通过提出面向种群规模感知的策略优化(PAPO)方法应对这些挑战。本文贡献体现在三个方面:首先,为高效生成面向不同种群规模博弈的有效策略,我们提出PAPO方法,该方法统一了两种自然方案(增强与超网络)并实现了显著更优的性能。PAPO包含三个组件:i) 种群规模编码——将原始种群规模值转换为等价编码以避免训练崩塌,ii) 超网络——根据种群规模条件为每个博弈生成独立策略,以及iii) 将种群规模作为生成策略的附加输入。其次,我们构建了基于多任务学习的训练流程,通过从不同种群规模的多个博弈中采样数据,高效训练PAPO的神经网络。最后,在多个环境上的大量实验表明PAPO显著优于基线方法,对生成策略演化规律的分析进一步加深了我们对有限与无限智能体博弈两个领域的理解。