We study the sample complexity of reinforcement learning (RL) in Mean-Field Games (MFGs) with model-based function approximation that requires strategic exploration to find a Nash Equilibrium policy. We introduce the Partial Model-Based Eluder Dimension (P-MBED), a more effective notion to characterize the model class complexity. Notably, P-MBED measures the complexity of the single-agent model class converted from the given mean-field model class, and potentially, can be exponentially lower than the MBED proposed by \citet{huang2023statistical}. We contribute a model elimination algorithm featuring a novel exploration strategy and establish sample complexity results polynomial w.r.t.~P-MBED. Crucially, our results reveal that, under the basic realizability and Lipschitz continuity assumptions, \emph{learning Nash Equilibrium in MFGs is no more statistically challenging than solving a logarithmic number of single-agent RL problems}. We further extend our results to Multi-Type MFGs, generalizing from conventional MFGs and involving multiple types of agents. This extension implies statistical tractability of a broader class of Markov Games through the efficacy of mean-field approximation. Finally, inspired by our theoretical algorithm, we present a heuristic approach with improved computational efficiency and empirically demonstrate its effectiveness.
翻译:我们研究了基于模型函数近似的平均场博弈(MFGs)中强化学习(RL)的样本复杂度问题,这类问题需要策略性探索以寻找纳什均衡策略。我们引入了部分模型基埃尔伍德维度(P-MBED),这是一种更有效的概念,用于刻画模型类复杂度。值得注意的是,P-MBED衡量了从给定平均场模型类转换而来的单智能体模型类的复杂度,并且可能比Huang等人(2023)提出的MBED呈指数级降低。我们贡献了一种融合新型探索策略的模型消除算法,并建立了关于P-MBED的多项式级样本复杂度结果。关键的是,我们的结果表明:在可实现性与利普希茨连续性的基本假设下,在平均场博弈中学习纳什均衡的统计难度并不高于求解对数数量级的单智能体强化学习问题。我们进一步将结果扩展至多类型平均场博弈,这是对传统平均场博弈的泛化,涉及多种类型的智能体。该扩展表明,通过平均场近似的有效性,更广泛的一类马尔可夫博弈具有统计可解性。最后,受理论算法启发,我们提出了一种计算效率更高的启发式方法,并通过实验证明了其有效性。