We study the sample complexity of reinforcement learning (RL) in Mean-Field Games (MFGs) with model-based function approximation that requires strategic exploration to find a Nash Equilibrium policy. We introduce the Partial Model-Based Eluder Dimension (P-MBED), a more effective notion to characterize the model class complexity. Notably, P-MBED measures the complexity of the single-agent model class converted from the given mean-field model class, and potentially, can be exponentially lower than the MBED proposed by \citet{huang2023statistical}. We contribute a model elimination algorithm featuring a novel exploration strategy and establish sample complexity results polynomial w.r.t.~P-MBED. Crucially, our results reveal that, under the basic realizability and Lipschitz continuity assumptions, \emph{learning Nash Equilibrium in MFGs is no more statistically challenging than solving a logarithmic number of single-agent RL problems}. We further extend our results to Multi-Type MFGs, generalizing from conventional MFGs and involving multiple types of agents. This extension implies statistical tractability of a broader class of Markov Games through the efficacy of mean-field approximation. Finally, inspired by our theoretical algorithm, we present a heuristic approach with improved computational efficiency and empirically demonstrate its effectiveness.
翻译:我们研究了平均场博弈中基于模型的函数逼近强化学习的样本复杂度,该问题需要通过策略性探索来寻找纳什均衡策略。我们引入了部分模型基埃尔默维度这一更有效的概念来刻画模型类复杂度。值得注意的是,P-MBED度量的是从给定平均场模型类转换而来的单智能体模型类的复杂度,其值可能比\citet{huang2023statistical}提出的MBED指数级更低。我们提出了一种采用新型探索策略的模型消除算法,并建立了关于P-MBED的多项式样本复杂度结果。关键的是,我们的结果表明:在基本可实现性和利普希茨连续性假设下,\emph{学习平均场博弈的纳什均衡在统计上并不比求解对数数量的单智能体强化学习问题更困难}。我们进一步将结果推广到多类型平均场博弈,这是对传统平均场博弈的泛化,涉及多种类型的智能体。该推广意味着通过平均场近似的有效性,更广泛的马尔可夫博弈类具有统计可处理性。最后,受理论算法启发,我们提出了一种计算效率更高的启发式方法,并通过实验验证了其有效性。