用于自动化股票交易的深度强化学习：一种集成策略 (Deep Reinforcement Learning for Automated Stock Trading: An Ensemble Strategy)

from arxiv, Accepted by ICAIF '20: Proceedings of the First ACM International Conference on AI in Finance. Conference program: https://ai-finance.org/2020program/

Stock trading strategies play a critical role in investment. However, it is challenging to design a profitable strategy in a complex and dynamic stock market. In this paper, we propose an ensemble strategy that employs deep reinforcement schemes to learn a stock trading strategy by maximizing investment return. We train a deep reinforcement learning agent and obtain an ensemble trading strategy using three actor-critic based algorithms: Proximal Policy Optimization (PPO), Advantage Actor Critic (A2C), and Deep Deterministic Policy Gradient (DDPG). The ensemble strategy inherits and integrates the best features of the three algorithms, thereby robustly adjusting to different market situations. In order to avoid the large memory consumption in training networks with continuous action space, we employ a load-on-demand technique for processing very large data. We test our algorithms on the 30 Dow Jones stocks that have adequate liquidity. The performance of the trading agent with different reinforcement learning algorithms is evaluated and compared with both the Dow Jones Industrial Average index and the traditional min-variance portfolio allocation strategy. The proposed deep ensemble strategy is shown to outperform the three individual algorithms and two baselines in terms of the risk-adjusted return measured by the Sharpe ratio. This work is fully open-sourced at \href{https://github.com/AI4Finance-Foundation/Deep-Reinforcement-Learning-for-Automated-Stock-Trading-Ensemble-Strategy-ICAIF-2020}{GitHub}.

翻译：股票交易策略在投资中起着至关重要的作用。然而，在复杂且动态变化的股票市场中设计盈利策略具有挑战性。本文提出一种集成策略，采用深度强化学习方案，通过最大化投资收益来学习股票交易策略。我们训练了一个深度强化学习智能体，并使用三种基于演员-评论家框架的算法——近端策略优化（PPO）、优势演员-评论家（A2C）和深度确定性策略梯度（DDPG）——获得集成交易策略。该集成策略继承并整合了三种算法的最佳特性，从而能够稳健地适应不同的市场状况。为避免在连续动作空间训练网络时消耗大量内存，我们采用按需加载技术处理超大规模数据。我们在具有充足流动性的30只道琼斯成分股上测试了算法。评估了采用不同强化学习算法的交易智能体性能，并与道琼斯工业平均指数及传统最小方差投资组合配置策略进行比较。实验表明，以夏普比率衡量的风险调整后收益为指标，所提出的深度集成策略优于三种独立算法及两种基线方法。本工作已在\\href{https://github.com/AI4Finance-Foundation/Deep-Reinforcement-Learning-for-Automated-Stock-Trading-Ensemble-Strategy-ICAIF-2020}{GitHub}完全开源。

相关内容