This study develops and evaluates a deep reinforcement learning framework for dynamic portfolio allocation across global equity markets. The Soft Actor-Critic algorithm is used to learn continuous portfolio weights within a Markov Decision Process, incorporating transaction costs, turnover penalties, and diversification constraints into the reward function. Five model configurations are compared, varying in reward formulation, policy structure (flat versus hierarchical Dirichlet), portfolio constraints, and temporal encoder (LSTM versus Transformer), and evaluated via walk-forward optimization across sixteen out-of-sample folds spanning 2003-2026 on the Nasdaq-100, Nikkei 225, and Euro Stoxx 50. Results show that RL strategies achieve competitive risk-adjusted performance primarily in the Euro Stoxx 50, where statistically significant abnormal returns are observed, but the central hypothesis is only partially confirmed: no strategy achieves statistically significant excess returns relative to Buy and Hold under HAC-robust inference across all markets. Regime analysis reveals that RL adds the most value during periods of elevated uncertainty, while ensemble aggregation across markets improves risk-adjusted performance and confirms the benefits of geographic diversification.
翻译:本研究开发并评估了一种用于全球股票市场动态投资组合配置的深度强化学习框架。采用软演员-评论家算法在马尔可夫决策过程中学习连续的投资组合权重,并将交易成本、换手率惩罚以及多样化约束纳入奖励函数。我们比较了五种模型配置,它们在奖励函数设计、策略结构(平面型与层次狄利克雷型)、投资组合约束以及时序编码器(LSTM与Transformer)方面存在差异,并通过向前优化方法在涵盖2003年至2026年期间纳斯达克100、日经225和欧洲斯托克50指数的十六个样本外折叠中进行评估。结果表明,强化学习策略主要在欧洲斯托克50指数上实现了具有竞争力的风险调整后收益,并观察到统计显著的超额收益,但核心假设仅得到部分证实:在基于异方差自相关一致性稳健推断下,没有任何策略在所有市场中能够相对于买入并持有策略获得统计显著的超额收益。市场制度分析揭示,强化学习在市场不确定性高企时期创造最大价值,而跨市场的集成聚合方法则提升了风险调整后收益,并证实了地域多样化的益处。