In this paper, we study the fundamental statistical efficiency of Reinforcement Learning in Mean-Field Control (MFC) and Mean-Field Game (MFG) with general model-based function approximation. We introduce a new concept called Mean-Field Model-Based Eluder Dimension (MF-MBED), which characterizes the inherent complexity of mean-field model classes. We show that a rich family of Mean-Field RL problems exhibits low MF-MBED. Additionally, we propose algorithms based on maximal likelihood estimation, which can return an $\epsilon$-optimal policy for MFC or an $\epsilon$-Nash Equilibrium policy for MFG. The overall sample complexity depends only polynomially on MF-MBED, which is potentially much lower than the size of state-action space. Compared with previous works, our results only require the minimal assumptions including realizability and Lipschitz continuity.
翻译:本文研究了基于通用模型的函数逼近方法下,均值场控制与均值场博弈中强化学习的根本统计效率。我们提出了一个称为均值场模型驱动埃尔罗德维数的新概念,用以刻画均值场模型类的内在复杂性。研究表明,广泛的均值场强化学习问题具有较低的MF-MBED值。此外,我们提出了基于极大似然估计的算法,该算法可为MFC返回ε最优策略,或为MFG返回ε纳什均衡策略。整体样本复杂度仅与MF-MBED呈多项式关系,其值可能远低于状态-动作空间的规模。与已有研究相比,我们的结果仅需满足可实现性与利普希茨连续性等最基本假设。