Order execution is a fundamental task in quantitative finance, aiming at finishing acquisition or liquidation for a number of trading orders of the specific assets. Recent advance in model-free reinforcement learning (RL) provides a data-driven solution to the order execution problem. However, the existing works always optimize execution for an individual order, overlooking the practice that multiple orders are specified to execute simultaneously, resulting in suboptimality and bias. In this paper, we first present a multi-agent RL (MARL) method for multi-order execution considering practical constraints. Specifically, we treat every agent as an individual operator to trade one specific order, while keeping communicating with each other and collaborating for maximizing the overall profits. Nevertheless, the existing MARL algorithms often incorporate communication among agents by exchanging only the information of their partial observations, which is inefficient in complicated financial market. To improve collaboration, we then propose a learnable multi-round communication protocol, for the agents communicating the intended actions with each other and refining accordingly. It is optimized through a novel action value attribution method which is provably consistent with the original learning objective yet more efficient. The experiments on the data from two real-world markets have illustrated superior performance with significantly better collaboration effectiveness achieved by our method.
翻译:订单执行是量化金融中的基础任务,旨在完成特定资产的多笔交易订单的买入或卖出。近期无模型强化学习(RL)的进展为订单执行问题提供了数据驱动的解决方案。然而,现有工作通常优化单个订单的执行,忽略了多个订单同时执行的实际情况,导致次优性和偏差。本文首先提出一种考虑实际约束的多智能体强化学习(MARL)方法用于多订单执行。具体而言,我们将每个智能体视为独立操作员来处理一笔特定订单,同时保持彼此通信并协作以最大化整体利润。然而,现有的MARL算法通常仅通过交换部分观测信息来实现智能体间的通信,这在复杂的金融市场中效率低下。为改进协作,我们提出一种可学习的多轮通信协议,使智能体之间能够相互传达计划执行的动作并据此进行优化。该协议通过一种新颖的动作价值归因方法进行优化,该方法被证明与原始学习目标一致且效率更高。在两个真实市场数据上的实验表明,我们的方法实现了显著更优的协作效果,性能优越。