MacroHFT: Memory Augmented Context-aware Reinforcement Learning On High Frequency Trading

High-frequency trading (HFT) that executes algorithmic trading in short time scales, has recently occupied the majority of cryptocurrency market. Besides traditional quantitative trading methods, reinforcement learning (RL) has become another appealing approach for HFT due to its terrific ability of handling high-dimensional financial data and solving sophisticated sequential decision-making problems, \emph{e.g.,} hierarchical reinforcement learning (HRL) has shown its promising performance on second-level HFT by training a router to select only one sub-agent from the agent pool to execute the current transaction. However, existing RL methods for HFT still have some defects: 1) standard RL-based trading agents suffer from the overfitting issue, preventing them from making effective policy adjustments based on financial context; 2) due to the rapid changes in market conditions, investment decisions made by an individual agent are usually one-sided and highly biased, which might lead to significant loss in extreme markets. To tackle these problems, we propose a novel Memory Augmented Context-aware Reinforcement learning method On HFT, \emph{a.k.a.} MacroHFT, which consists of two training phases: 1) we first train multiple types of sub-agents with the market data decomposed according to various financial indicators, specifically market trend and volatility, where each agent owns a conditional adapter to adjust its trading policy according to market conditions; 2) then we train a hyper-agent to mix the decisions from these sub-agents and output a consistently profitable meta-policy to handle rapid market fluctuations, equipped with a memory mechanism to enhance the capability of decision-making. Extensive experiments on various cryptocurrency markets demonstrate that MacroHFT can achieve state-of-the-art performance on minute-level trading tasks.

翻译：高频交易（HFT）在极短时间尺度上执行算法交易，近年来已占据加密货币市场的主要份额。除传统的量化交易方法外，强化学习（RL）因其处理高维金融数据和解决复杂序列决策问题的卓越能力，已成为HFT领域另一具有吸引力的方法。例如，分层强化学习（HRL）通过训练路由从代理池中仅选择一个子代理执行当前交易，已在秒级HFT中展现出良好性能。然而，现有面向HFT的RL方法仍存在缺陷：1）基于标准RL的交易代理存在过拟合问题，难以根据金融上下文进行有效的策略调整；2）由于市场条件快速变化，单个代理作出的投资决策通常具有片面性和高度偏差，在极端市场中可能导致重大损失。为解决这些问题，我们提出一种新颖的面向高频交易的内存增强型上下文感知强化学习方法（简称MacroHFT），该方法包含两个训练阶段：1）首先根据市场趋势和波动性等多种金融指标分解市场数据，训练多种类型的子代理，每个代理配备条件适配器以根据市场状况调整其交易策略；2）随后训练一个超代理，用于混合这些子代理的决策并输出具有持续盈利能力的元策略以应对快速市场波动，同时配备记忆机制以增强决策能力。在多种加密货币市场上的大量实验表明，MacroHFT能在分钟级交易任务上取得最先进的性能。