This paper studies the continuous-time q-learning in the mean-field jump-diffusion models from the representative agent's perspective. To overcome the challenge when the population distribution may not be directly observable, we introduce the integrated q-function in decoupled form (decoupled Iq-function) and establish its martingale characterization together with the value function, which provides a unified policy evaluation rule for both mean-field game (MFG) and mean-field control (MFC) problems. Moreover, depending on the task to solve the MFG or MFC problem, we can employ the decoupled Iq-function by different means to learn the mean-field equilibrium policy or the mean-field optimal policy respectively. As a result, we devise a unified q-learning algorithm for both MFG and MFC problems by utilizing all test policies stemming from the mean-field interactions. For several examples in the jump-diffusion setting, within and beyond the LQ framework, we can obtain the exact parameterization of the decoupled Iq-functions and the value functions, and illustrate our algorithm from the representative agent's perspective with satisfactory performance.
翻译:本文从代表性智能体的视角研究平均场跳跃扩散模型中的连续时间Q学习。为克服群体分布可能无法直接观测的挑战,我们引入解耦形式的积分Q函数(解耦Iq函数),并建立其与值函数的鞅刻画,从而为平均场博弈(MFG)和平均场控制(MFC)问题提供了统一的策略评估准则。此外,根据求解MFG或MFC问题的任务需求,我们可以通过不同方式利用解耦Iq函数分别学习平均场均衡策略或平均场最优策略。基于此,我们通过利用源自平均场交互的所有测试策略,设计了一种适用于MFG和MFC问题的统一Q学习算法。对于跳跃扩散设定下的若干示例(包括线性二次框架内外的情形),我们能够获得解耦Iq函数与值函数的精确参数化表示,并从代表性智能体视角展示算法性能,结果令人满意。