In this paper, we present a robust incremental learning pipeline for regression tasks on temporal tabular datasets. Using commonly available tabular and time-series prediction models as building blocks, a machine-learning pipeline is built incrementally to adapt to distributional shifts. The pipeline is universal to all standardised datasets as no data-dependent feature engineering methods is required. Using the concept of self-similarity, the pipeline uses only two basic building blocks of ML models, gradient boosting decision trees and networks to build models for any required complexity. The pipeline is efficient as no specialised neural architectures are used and each model building block can be independently trained. The pipeline is demonstrated to have robust performances under adverse situations such as regime changes, fat-tailed distributions and low signal-to-noise ratios.
翻译:本文提出了一种面向时间序列表格数据回归任务的鲁棒增量学习流水线。该流水线以通用表格模型与时间序列预测模型为基础组件,通过增量式构建机器学习流水线来适应数据分布偏移。由于无需依赖数据特征工程方法,该流水线可适用于所有标准化数据集。基于自相似性概念,该流水线仅使用梯度提升决策树与神经网络两种基础机器学习模型组件,即可构建任意复杂度的模型。流水线无需专用神经架构,每个模型组件均可独立训练,因而具有高效性。实验表明,该流水线在体制转换、重尾分布及低信噪比等不利条件下仍能保持稳健性能。