Coordinating large populations of interacting agents is a central challenge in multi-agent reinforcement learning (MARL), where the size of the joint state-action space scales exponentially with the number of agents. Mean-field methods alleviate this burden by aggregating agent interactions, but these approaches assume homogeneous interactions. Recent graphon-based frameworks capture heterogeneity, but are computationally expensive as the number of agents grows. Therefore, we introduce $\texttt{GMFS}$, a $\textbf{G}$raphon $\textbf{M}$ean-$\textbf{F}$ield $\textbf{S}$ubsampling framework for scalable cooperative MARL with heterogeneous agent interactions. By subsampling $κ$ agents according to interaction strength, we approximate the graphon-weighted mean-field and learn a policy with sample complexity $\mathrm{poly}(κ)$ and optimality gap $O(1/\sqrtκ)$. We verify our theory with numerical simulations in robotic coordination, showing that $\texttt{GMFS}$ achieves near-optimal performance.
翻译:协调大规模交互智能体群体是多智能体强化学习(MARL)中的一个核心挑战,其中联合状态-动作空间的规模随智能体数量呈指数级增长。平均场方法通过聚合智能体交互来缓解这一负担,但这些方法假设交互是同质的。近期基于图的方法能够捕捉异质性,但随着智能体数量增加,其计算成本高昂。为此,我们提出 $\texttt{GMFS}$,一种用于可扩展合作式MARL的$\textbf{图}$平均$\textbf{场}$子$\textbf{采}$样框架,该框架适用于异构智能体交互。通过根据交互强度对 $κ$ 个智能体进行子采样,我们近似图加权平均场,并以样本复杂度 $\mathrm{poly}(κ)$ 和最优性差距 $O(1/\sqrtκ)$ 学习策略。我们在机器人协调的数值模拟中验证了理论,表明 $\texttt{GMFS}$ 实现了接近最优的性能。