The application of deep learning to non-stationary temporal datasets can lead to overfitted models that underperform under regime changes. In this work, we propose a modular machine learning pipeline for ranking predictions on temporal panel datasets which is robust under regime changes. The modularity of the pipeline allows the use of different models, including Gradient Boosting Decision Trees (GBDTs) and Neural Networks, with and without feature engineering. We evaluate our framework on financial data for stock portfolio prediction, and find that GBDT models with dropout display high performance, robustness and generalisability with reduced complexity and computational cost. We then demonstrate how online learning techniques, which require no retraining of models, can be used post-prediction to enhance the results. First, we show that dynamic feature projection improves robustness by reducing drawdown in regime changes. Second, we demonstrate that dynamical model ensembling based on selection of models with good recent performance leads to improved Sharpe and Calmar ratios of out-of-sample predictions. We also evaluate the robustness of our pipeline across different data splits and random seeds with good reproducibility.
翻译:将深度学习应用于非平稳时间序列数据集可能导致模型过拟合,并在机制变化时表现不佳。本文提出了一种模块化机器学习流水线,用于对具有机制变化鲁棒性的时间面板数据集进行排序预测。该流水线的模块化特性支持使用不同模型,包括梯度提升决策树(GBDT)和神经网络,并可搭配或不搭配特征工程。我们在金融股票组合预测数据上评估了该框架,发现引入dropout的GBDT模型在降低复杂性和计算成本的同时,表现出高性能、鲁棒性和泛化能力。随后,我们展示了如何利用无需重新训练模型的在线学习技术,在预测后增强结果。首先,我们证明动态特征投影通过减少机制变化中的回撤提高了鲁棒性;其次,我们证明基于近期表现良好模型选择的动态模型集成方法,可提升样本外预测的夏普比率和卡玛比率。我们还通过不同数据分割和随机种子评估了流水线的鲁棒性,结果具有良好的可复现性。