We explore the problem of imitation learning (IL) in the context of mean-field games (MFGs), where the goal is to imitate the behavior of a population of agents following a Nash equilibrium policy according to some unknown payoff function. IL in MFGs presents new challenges compared to single-agent IL, particularly when both the reward function and the transition kernel depend on the population distribution. In this paper, departing from the existing literature on IL for MFGs, we introduce a new solution concept called the Nash imitation gap. Then we show that when only the reward depends on the population distribution, IL in MFGs can be reduced to single-agent IL with similar guarantees. However, when the dynamics is population-dependent, we provide a novel upper-bound that suggests IL is harder in this setting. To address this issue, we propose a new adversarial formulation where the reinforcement learning problem is replaced by a mean-field control (MFC) problem, suggesting progress in IL within MFGs may have to build upon MFC.
翻译:我们探讨了均值场博弈(MFGs)背景下的模仿学习(IL)问题,其目标是根据某个未知收益函数模仿遵循纳什均衡策略的智能体种群行为。与单智能体模仿学习相比,MFGs中的模仿学习面临新的挑战,尤其是当奖励函数和转移核均依赖于种群分布时。本文区别于现有MFGs模仿学习文献,引入了一种名为纳什模仿间隙的新解概念。随后我们证明:当仅奖励依赖于种群分布时,MFGs中的模仿学习可简化为具有类似保证的单智能体模仿学习。然而当动力学依赖种群分布时,我们提出了一个新颖的上界,表明该场景下的模仿学习更为困难。为解决这一问题,我们提出了一种新的对抗性框架,其中强化学习问题被替换为均值场控制(MFC)问题,这表明MFGs中模仿学习的进展可能必须基于MFC建设。