We investigate the potential of Multi-Objective, Deep Reinforcement Learning for stock and cryptocurrency single-asset trading: in particular, we consider a Multi-Objective algorithm which generalizes the reward functions and discount factor (i.e., these components are not specified a priori, but incorporated in the learning process). Firstly, using several important assets (cryptocurrency pairs BTCUSD, ETHUSDT, XRPUSDT, and stock indexes AAPL, SPY, NIFTY50), we verify the reward generalization property of the proposed Multi-Objective algorithm, and provide preliminary statistical evidence showing increased predictive stability over the corresponding Single-Objective strategy. Secondly, we show that the Multi-Objective algorithm has a clear edge over the corresponding Single-Objective strategy when the reward mechanism is sparse (i.e., when non-null feedback is infrequent over time). Finally, we discuss the generalization properties with respect to the discount factor. The entirety of our code is provided in open source format.
翻译:我们探究了多目标深度强化学习在股票和加密货币单一资产交易中的潜力:特别地,我们采用了一种多目标算法,该算法对奖励函数和折扣因子进行泛化(即这些组件并非预先指定,而是在学习过程中被纳入考量)。首先,基于若干重要资产(加密货币交易对BTCUSD、ETHUSDT、XRPUSDT,以及股票指数AAPL、SPY、NIFTY50),我们验证了所提出多目标算法的奖励泛化特性,并提供了初步统计证据,表明其相较于对应单目标策略具有更强的预测稳定性。其次,我们证明当奖励机制稀疏时(即非零反馈随时间出现的频率较低),多目标算法相较于对应单目标策略具有明显优势。最后,我们讨论了关于折扣因子的泛化特性。我们的全部代码均以开源形式提供。