Traditional Long Short-Term Memory (LSTM) networks are effective for handling sequential data but have limitations such as gradient vanishing and difficulty in capturing long-term dependencies, which can impact their performance in dynamic and risky environments like stock trading. To address these limitations, this study explores the usage of the newly introduced Extended Long Short Term Memory (xLSTM) network in combination with a deep reinforcement learning (DRL) approach for automated stock trading. Our proposed method utilizes xLSTM networks in both actor and critic components, enabling effective handling of time series data and dynamic market environments. Proximal Policy Optimization (PPO), with its ability to balance exploration and exploitation, is employed to optimize the trading strategy. Experiments were conducted using financial data from major tech companies over a comprehensive timeline, demonstrating that the xLSTM-based model outperforms LSTM-based methods in key trading evaluation metrics, including cumulative return, average profitability per trade, maximum earning rate, maximum pullback, and Sharpe ratio. These findings mark the potential of xLSTM for enhancing DRL-based stock trading systems.
翻译:传统长短期记忆(LSTM)网络在处理序列数据方面具有有效性,但存在梯度消失和难以捕捉长期依赖关系等局限性,这会影响其在股票交易等动态高风险环境中的表现。为解决这些局限,本研究探索了将新引入的扩展长短期记忆(xLSTM)网络与深度强化学习(DRL)方法相结合,用于自动股票交易。我们提出的方法在actor和critic组件中均采用xLSTM网络,从而有效处理时间序列数据和动态市场环境。利用能够平衡探索与利用的近端策略优化(PPO)算法来优化交易策略。实验采用主要科技公司跨越综合时间维度的金融数据,结果表明,基于xLSTM的模型在累计回报、单笔交易平均盈利、最大收益率、最大回撤和夏普比率等关键交易评估指标上均优于基于LSTM的方法。这些发现标志着xLSTM在增强基于DRL的股票交易系统方面具有潜力。