In this paper, we study the statistical efficiency of Reinforcement Learning in Mean-Field Control (MFC) and Mean-Field Game (MFG) with general function approximation. We introduce a new concept called Mean-Field Model-Based Eluder Dimension (MBED), which subsumes a rich family of Mean-Field RL problems. Additionally, we propose algorithms based on Optimistic Maximal Likelihood Estimation, which can return an $\epsilon$-optimal policy for MFC or an $\epsilon$-Nash Equilibrium policy for MFG, with sample complexity polynomial w.r.t. relevant parameters and independent of the number of states, actions and the number of agents. Notably, our results only require a mild assumption of Lipschitz continuity on transition dynamics and avoid strong structural assumptions in previous work. Finally, in the tabular setting, given the access to a generative model, we establish an exponential lower bound for MFC setting, while providing a novel sample-efficient model elimination algorithm to approximate equilibrium in MFG setting. Our results reveal a fundamental separation between RL for single-agent, MFC, and MFG from the sample efficiency perspective.
翻译:本文研究了具有一般函数近似的平均场控制(MFC)和平均场博弈(MFG)中强化学习的统计效率。我们引入了名为平均场基于模型回波维度(MBED)的新概念,该概念涵盖了一类丰富的平均场强化学习问题。此外,我们提出了基于乐观最大似然估计的算法,该算法可为MFC返回ϵ-最优策略,或为MFG返回ϵ-纳什均衡策略,其样本复杂度关于相关参数呈多项式增长,且与状态数、动作数和智能体数量无关。值得注意的是,我们的结果仅对转移动力学施加了李普希茨连续性的温和假设,避免了先前工作中强结构假设的要求。最后,在表格设置中,给定生成模型的访问权限,我们为MFC场景建立了指数级下界,同时提出了一种新颖的样本高效模型消除算法来近似MFG场景中的均衡。我们的结果从样本效率角度揭示了单智能体强化学习、MFC与MFG之间的根本性分离。