This paper studies the continuous-time q-learning in mean-field jump-diffusion models when the population distribution is not directly observable. We propose the integrated q-function in decoupled form (decoupled Iq-function) from the representative agent's perspective and establish its martingale characterization, which provides a unified policy evaluation rule for both mean-field game (MFG) and mean-field control (MFC) problems. Moreover, we consider the learning procedure where the representative agent updates the population distribution based on his own state values. Depending on the task to solve the MFG or MFC problem, we can employ the decoupled Iq-function differently to characterize the mean-field equilibrium policy or the mean-field optimal policy respectively. Based on these theoretical findings, we devise a unified q-learning algorithm for both MFG and MFC problems by utilizing test policies and the averaged martingale orthogonality condition. For several financial applications in the jump-diffusion setting, we obtain the exact parameterization of the decoupled Iq-functions and the value functions, and illustrate our q-learning algorithm with satisfactory performance.
翻译:本文研究当群体分布无法直接观测时,平均场跳跃-扩散模型中的连续时间Q学习。我们从代表性智能体的视角出发,提出了解耦形式的集成Q函数(解耦Iq函数),并建立了其鞅刻画,这为平均场博弈(MFG)和平均场控制(MFC)问题提供了统一的策略评估准则。此外,我们考虑了一种学习过程,其中代表性智能体根据其自身状态值更新群体分布。根据求解MFG或MFC问题的任务需求,我们可以分别以不同方式运用解耦Iq函数来刻画平均场均衡策略或平均场最优策略。基于这些理论发现,我们通过利用测试策略和平均鞅正交条件,设计了一种适用于MFG和MFC问题的统一Q学习算法。对于跳跃-扩散设定下的若干金融应用,我们获得了解耦Iq函数与价值函数的精确参数化表示,并通过实验展示了所提Q学习算法具有令人满意的性能。