In this paper, we present a robust deep incremental learning model for regression tasks on financial temporal tabular datasets. Using commonly available tabular and time-series prediction models as building blocks, a machine-learning model is built incrementally to adapt to distributional shifts in data. Using the concept of self-similarity, the model uses only a basic building block of machine learning methods, decision trees to build models of any required complexity. The model is demonstrated to have robust performances under adverse situations such as regime changes, fat-tailed distributions and low signal-to-noise ratios which is common in financial datasets. Model robustness are studied under different hyper-parameters such as model complexity and data sampling settings using XGBoost models trained on the Numerai dataset as a detailed case study. The two layer deep ensemble of XGBoost models over different model snapshots is demonstrated to deliver high quality predictions under different market regimes. Comparing the XGBoost models with different number of boosting rounds in three scenarios (small, standard and large), we demonstrated the model performances are monotonic increasing with respect to model sizes and converges towards the generalisation upper bound. Our model is efficient with much lower hardware requirement than other machine learning models as no specialised neural architectures are used and each base model can be independently trained in parallel.
翻译:本文提出了一种针对金融时序表格数据回归任务的鲁棒深度增量学习模型。利用常用表格数据与时间序列预测模型作为基础组件,该机器学习模型通过增量方式构建,以适应数据分布漂移。基于自相似性概念,该模型仅使用机器学习方法的基本构建块——决策树,即可构建任意复杂度的模型。实验表明,该模型在金融数据中常见的体制转换、重尾分布及低信噪比等不利条件下仍具有鲁棒性能。通过以Numerai数据集训练的XGBoost模型为详细案例,研究了模型复杂度与数据采样设置等不同超参数下的模型鲁棒性。采用不同模型快照的两层深度集成XGBoost模型,验证了其在多种市场体制下均能提供高质量预测。通过对比三种场景(小规模、标准及大规模)中不同提升轮数的XGBoost模型,我们证明了模型性能随模型规模单调递增,并趋于泛化上界。由于未使用专用神经架构,且每个基模型可独立并行训练,本模型硬件需求远低于其他机器学习方法,具备高效性。