Tree-based models have been successfully applied to a wide variety of tasks, including time series forecasting. They are increasingly in demand and widely accepted because of their comparatively high level of interpretability. However, many of them suffer from the overfitting problem, which limits their application in real-world decision-making. This problem becomes even more severe in online-forecasting settings where time series observations are incrementally acquired, and the distributions from which they are drawn may keep changing over time. In this context, we propose a novel method for the online selection of tree-based models using the TreeSHAP explainability method in the task of time series forecasting. We start with an arbitrary set of different tree-based models. Then, we outline a performance-based ranking with a coherent design to make TreeSHAP able to specialize the tree-based forecasters across different regions in the input time series. In this framework, adequate model selection is performed online, adaptively following drift detection in the time series. In addition, explainability is supported on three levels, namely online input importance, model selection, and model output explanation. An extensive empirical study on various real-world datasets demonstrates that our method achieves excellent or on-par results in comparison to the state-of-the-art approaches as well as several baselines.
翻译:树基模型已成功应用于包括时间序列预测在内的多种任务。由于其较高的可解释性,这类模型需求日益增长且被广泛接受。然而,许多树基模型存在过拟合问题,这限制了它们在现实决策中的应用。在在线预测场景中,当时间序列观测值逐步获取且其分布可能随时间持续变化时,这一问题更为严峻。为此,本文提出一种利用TreeSHAP可解释性方法进行树基模型在线选择的新方法。我们从一组任意的不同树基模型出发,通过精心设计的基于性能的排序,使TreeSHAP能够针对输入时间序列的不同区域专门化树基预测器。该框架中,模型选择以在线方式自适应执行,并遵循时间序列中的漂移检测。此外,可解释性在三个层面得到支持:在线输入重要性、模型选择及模型输出解释。在多个真实数据集上的广泛实证研究表明,与现有最优方法及多种基线相比,本方法取得了优异或相当的结果。