We present a robust deep incremental learning framework for regression tasks on financial temporal tabular datasets which is built upon the incremental use of commonly available tabular and time series prediction models to adapt to distributional shifts typical of financial datasets. The framework uses a simple basic building block (decision trees) to build self-similar models of any required complexity to deliver robust performance under adverse situations such as regime changes, fat-tailed distributions, and low signal-to-noise ratios. As a detailed study, we demonstrate our scheme using XGBoost models trained on the Numerai dataset and show that a two layer deep ensemble of XGBoost models over different model snapshots delivers high quality predictions under different market regimes. We also show that the performance of XGBoost models with different number of boosting rounds in three scenarios (small, standard and large) is monotonically increasing with respect to model size and converges towards the generalisation upper bound. We also evaluate the robustness of the model under variability of different hyperparameters, such as model complexity and data sampling settings. Our model has low hardware requirements as no specialised neural architectures are used and each base model can be independently trained in parallel.
翻译:我们提出了一种稳健的深度增量学习框架,用于金融时间序列表格数据上的回归任务。该框架基于对通用表格数据和时间序列预测模型的增量使用,以适应金融数据集典型的分布漂移。该框架利用简单的基本构建块(决策树)构建任意复杂度的自相似模型,从而在 regime 转换、厚尾分布和低信噪比等不利条件下提供稳健性能。作为详细研究,我们使用在 Numerai 数据集上训练的 XGBoost 模型演示了该方案,并表明:在不同模型快照上构建的两层深度 XGBoost 集成模型能够在不同市场 regime 下提供高质量预测。我们还证明,在三种场景(小规模、标准和大规模)中,具有不同提升轮数的 XGBoost 模型的性能随模型规模单调递增,并收敛至泛化上限。此外,我们评估了模型在不同超参数(如模型复杂度和数据采样设置)变化下的稳健性。由于未使用专用神经架构,且每个基础模型可独立并行训练,我们的模型硬件需求较低。