The application of deep learning to non-stationary temporal datasets can lead to overfitted models that underperform under regime changes. In this work, we propose a modular machine learning pipeline for ranking predictions on temporal panel datasets which is robust under regime changes. The modularity of the pipeline allows the use of different models, including Gradient Boosting Decision Trees (GBDTs) and Neural Networks, with and without feature engineering. We evaluate our framework on financial data for stock portfolio prediction, and find that GBDT models with dropout display high performance, robustness and generalisability with reduced complexity and computational cost. We then demonstrate how online learning techniques, which require no retraining of models, can be used post-prediction to enhance the results. First, we show that dynamic feature projection improves robustness by reducing drawdown in regime changes. Second, we demonstrate that dynamical model ensembling based on selection of models with good recent performance leads to improved Sharpe and Calmar ratios of out-of-sample predictions. We also evaluate the robustness of our pipeline across different data splits and random seeds with good reproducibility.
翻译:将深度学习应用于非平稳时间数据集可能导致模型过拟合,在制度变化下表现不佳。本文提出了一种模块化机器学习流水线,用于对具有制度变化鲁棒性的时间面板数据集进行排序预测。该流水线的模块化特性允许使用不同模型,包括梯度提升决策树(GBDTs)和神经网络,且可灵活选择是否进行特征工程。我们在股票投资组合预测的金融数据上评估了该框架,发现加入丢弃法的GBDT模型在降低复杂性和计算成本的同时,展现出高性能、鲁棒性和泛化能力。随后,我们展示了如何利用无需重新训练模型的在线学习技术,在预测后阶段增强结果。首先,我们证明了动态特征投影通过减少制度变化中的回撤来提高鲁棒性。其次,我们表明基于近期表现良好模型选择的动态模型集成,能够提升样本外预测的夏普比率和卡玛比率。此外,通过不同数据划分和随机种子的测试,我们验证了该流水线具有良好的可复现性鲁棒性。