Recent advances in mean-field game literature enable the reduction of large-scale multi-agent problems to tractable interactions between a representative agent and a population distribution. However, existing approaches typically assume a fixed initial population distribution and fully rational agents, limiting robustness under distributional uncertainty and cognitive constraints. We address these limitations by introducing risk aversion with respect to the initial population distribution and by incorporating bounded rationality to model deviations from fully rational decision-making agents. The combination of these two elements yields a new and more general equilibrium concept, which we term the mean-field risk-averse quantal response equilibrium (MF-RQE). We establish existence results and prove convergence of fixed-point iteration and fictitious play to MF-RQE. Building on these insights, we develop a scalable reinforcement learning algorithm for scenarios with large state-action spaces. Numerical experiments demonstrate that MF-RQE policies achieve improved robustness relative to classical mean-field approaches that optimize expected cumulative rewards under a fixed initial distribution and are restricted to entropy-based regularizers.
翻译:平均场博弈理论的最新进展使得大规模多智能体问题可简化为代表性智能体与群体分布间的可处理交互。然而,现有方法通常假设固定的初始群体分布与完全理性智能体,这限制了其在分布不确定性及认知约束下的鲁棒性。我们通过引入针对初始群体分布的风险规避机制,并结合有限理性以建模对完全理性决策智能体的偏离,从而解决了这些局限性。这两种要素的结合产生了一种全新且更一般的均衡概念,我们将其称为平均场风险规避量化响应均衡(MF-RQE)。我们建立了存在性结果,并证明了定点迭代与虚拟博弈对MF-RQE的收敛性。基于这些洞见,我们针对具有大规模状态-动作空间的场景,开发了一种可扩展的强化学习算法。数值实验表明,相对于在固定初始分布下优化期望累积奖励且仅限于基于熵的正则化器的经典平均场方法,MF-RQE策略实现了更强的鲁棒性。