We introduce the first end-to-end Deep Reinforcement Learning (DRL) based framework for active high frequency trading in the stock market. We train DRL agents to trade one unit of Intel Corporation stock by employing the Proximal Policy Optimization algorithm. The training is performed on three contiguous months of high frequency Limit Order Book data, of which the last month constitutes the validation data. In order to maximise the signal to noise ratio in the training data, we compose the latter by only selecting training samples with largest price changes. The test is then carried out on the following month of data. Hyperparameters are tuned using the Sequential Model Based Optimization technique. We consider three different state characterizations, which differ in their LOB-based meta-features. Analysing the agents' performances on test data, we argue that the agents are able to create a dynamic representation of the underlying environment. They identify occasional regularities present in the data and exploit them to create long-term profitable trading strategies. Indeed, agents learn trading strategies able to produce stable positive returns in spite of the highly stochastic and non-stationary environment.
翻译:我们提出了首个基于深度强化学习的端到端框架,用于股票市场的主动高频交易。通过采用近端策略优化算法,我们训练深度强化学习代理交易一单位英特尔公司股票。训练基于连续三个月的高频限价订单簿数据,其中最后一个月作为验证数据。为最大化训练数据的信噪比,我们仅选取价格变化最大的样本构成训练集,并在随后一个月的数据上进行测试。超参数通过序贯模型优化技术进行调优。我们考虑了三种不同的状态表征方式,其在基于限价订单簿的元特征上存在差异。通过分析代理在测试数据上的表现,我们发现这些代理能够对底层环境构建动态表征,识别数据中存在的偶发性规律,并利用这些规律形成长期盈利的交易策略。事实上,尽管面临高度随机且非平稳的环境,代理所习得的交易策略仍能产生稳定的正收益。