In this paper, we study the statistical efficiency of Reinforcement Learning in Mean-Field Control (MFC) and Mean-Field Game (MFG) with general function approximation. We introduce a new concept called Mean-Field Model-Based Eluder Dimension (MBED), which subsumes a rich family of Mean-Field RL problems. Additionally, we propose algorithms based on Optimistic Maximal Likelihood Estimation, which can return an $\epsilon$-optimal policy for MFC or an $\epsilon$-Nash Equilibrium policy for MFG, with sample complexity polynomial w.r.t. relevant parameters and independent of the number of states, actions and the number of agents. Notably, our results only require a mild assumption of Lipschitz continuity on transition dynamics and avoid strong structural assumptions in previous work. Finally, in the tabular setting, given the access to a generative model, we establish an exponential lower bound for MFC setting, while providing a novel sample-efficient model elimination algorithm to approximate equilibrium in MFG setting. Our results reveal a fundamental separation between RL for single-agent, MFC, and MFG from the sample efficiency perspective.
翻译:本文研究了一般函数逼近下均场控制(MFC)与均场博弈(MFG)中强化学习的统计效率。我们提出了一种名为均场模型驱动埃尔维德维度(MBED)的新概念,该概念涵盖了丰富的均场强化学习问题族。此外,我们设计了基于乐观极大似然估计的算法,可分别返回MFC的ε-最优策略或MFG的ε-纳什均衡策略,其样本复杂度关于相关参数呈多项式增长,且与状态、动作及智能体数量无关。值得注意的是,我们的结果仅要求转移动态满足利普希茨连续性这一温和假设,避免了以往工作中的强结构假设。最终,在表格型设置下,基于生成模型的访问权限,我们为MFC场景建立了指数级下界,同时为MFG场景提供了一种新型样本高效模型消除算法以逼近均衡。我们的结果从样本效率角度揭示了单智能体强化学习、MFC与MFG之间的本质区别。