We study a game between liquidity provider and liquidity taker agents interacting in an over-the-counter market, for which the typical example is foreign exchange. We show how a suitable design of parameterized families of reward functions coupled with shared policy learning constitutes an efficient solution to this problem. By playing against each other, our deep-reinforcement-learning-driven agents learn emergent behaviors relative to a wide spectrum of objectives encompassing profit-and-loss, optimal execution and market share. In particular, we find that liquidity providers naturally learn to balance hedging and skewing, where skewing refers to setting their buy and sell prices asymmetrically as a function of their inventory. We further introduce a novel RL-based calibration algorithm which we found performed well at imposing constraints on the game equilibrium. On the theoretical side, we are able to show convergence rates for our multi-agent policy gradient algorithm under a transitivity assumption, closely related to generalized ordinal potential games.
翻译:我们研究了一个由流动性提供者和流动性接受者智能体在场外交易市场中交互的博弈问题,其典型例子是外汇市场。我们展示了如何通过合理设计参数化奖励函数族并结合共享策略学习,为这一问题提供高效解决方案。通过相互博弈,由深度强化学习驱动的智能体能够自主习得与损益、最优执行及市场份额等广泛目标相关的涌现行为。特别地,我们发现流动性提供者自然学会了平衡对冲与斜偏策略——其中斜偏指根据自身库存不对称地设定买卖价格。我们进一步提出了一种新颖的基于强化学习的校准算法,该算法在施加博弈均衡约束方面表现优异。在理论层面,我们证明了在传递性假设下(该假设与广义序数势博弈密切相关),我们的多智能体策略梯度算法具有收敛速率。