This paper presents a comprehensive study on the use of ensemble Reinforcement Learning (RL) models in financial trading strategies, leveraging classifier models to enhance performance. By combining RL algorithms such as A2C, PPO, and SAC with traditional classifiers like Support Vector Machines (SVM), Decision Trees, and Logistic Regression, we investigate how different classifier groups can be integrated to improve risk-return trade-offs. The study evaluates the effectiveness of various ensemble methods, comparing them with individual RL models across key financial metrics, including Cumulative Returns, Sharpe Ratios (SR), Calmar Ratios, and Maximum Drawdown (MDD). Our original experimental results demonstrate that ensemble methods often outperform base models in terms of risk-adjusted returns, providing better management of drawdowns and overall stability. However, both the original analysis and the additional reproduction reported in this version show that ensemble performance is sensitive to the choice of variance threshold \(τ\), classifier group, RL-agent pair, and market universe. The reproduction evidence strengthens the conclusion that classifier-assisted ensemble selection can improve robustness, while also clarifying that the advantage is conditional rather than automatic across all datasets. This study emphasizes the value of combining RL with classifiers for adaptive decision-making, with implications for financial trading, robotics, and other dynamic environments.
翻译:本文全面研究了在金融交易策略中运用集成强化学习(RL)模型,并借助分类模型提升性能的方法。通过将A2C、PPO和SAC等强化学习算法与支持向量机(SVM)、决策树及逻辑回归等传统分类器相结合,我们探究了如何整合不同分类器组以优化风险-收益权衡。本研究评估了多种集成方法的有效性,并将其与单一强化学习模型在关键财务指标(包括累计收益、夏普比率、卡玛比率和最大回撤)上进行了比较。我们的原始实验结果表明,在风险调整收益方面,集成方法通常优于基础模型,能更好地管理回撤并提升整体稳定性。然而,原分析与本版本中报告的额外复现结果均显示,集成性能对变异阈值 \(τ\)、分类器组、强化学习智能体对以及市场范围的选取高度敏感。复现证据强化了分类器辅助的集成选择能提升鲁棒性这一结论,同时澄清了这种优势具有条件性,而非在所有数据集中自动成立。本研究强调了将强化学习与分类器相结合以实现自适应决策的价值,对金融交易、机器人技术及其他动态环境具有启示意义。