We employ deep reinforcement learning (RL) to train an agent to successfully translate a high-frequency trading signal into a trading strategy that places individual limit orders. Based on the ABIDES limit order book simulator, we build a reinforcement learning OpenAI gym environment and utilise it to simulate a realistic trading environment for NASDAQ equities based on historic order book messages. To train a trading agent that learns to maximise its trading return in this environment, we use Deep Duelling Double Q-learning with the APEX (asynchronous prioritised experience replay) architecture. The agent observes the current limit order book state, its recent history, and a short-term directional forecast. To investigate the performance of RL for adaptive trading independently from a concrete forecasting algorithm, we study the performance of our approach utilising synthetic alpha signals obtained by perturbing forward-looking returns with varying levels of noise. Here, we find that the RL agent learns an effective trading strategy for inventory management and order placing that outperforms a heuristic benchmark trading strategy having access to the same signal.
翻译:我们采用深度强化学习(RL)训练智能体,成功将高频交易信号转化为一种执行个体限价订单的交易策略。基于ABIDES限价订单簿模拟器,我们构建了一个强化学习OpenAI gym环境,并利用历史订单簿消息模拟纳斯达克股票的真实交易环境。为训练一个在该环境中最大化交易收益的智能体,我们采用带有APEX(异步优先经验回放)架构的深度双重决斗Q学习。智能体观察当前限价订单簿状态、其近期历史以及短期方向性预测。为了独立于具体预测算法研究自适应交易中RL的性能,我们通过利用带有不同噪声水平的前瞻性回报扰动获得合成alpha信号,测试了该方法的表现。研究发现,RL智能体学会了有效的库存管理和订单执行交易策略,其性能优于基于相同信号的启发式基准交易策略。