This paper shows that time series forecasting Transformer (TSFT) suffers from severe over-fitting problem caused by improper initialization method of unknown decoder inputs, esp. when handling non-stationary time series. Based on this observation, we propose GBT, a novel two-stage Transformer framework with Good Beginning. It decouples the prediction process of TSFT into two stages, including Auto-Regression stage and Self-Regression stage to tackle the problem of different statistical properties between input and prediction sequences.Prediction results of Auto-Regression stage serve as a Good Beginning, i.e., a better initialization for inputs of Self-Regression stage. We also propose Error Score Modification module to further enhance the forecasting capability of the Self-Regression stage in GBT. Extensive experiments on seven benchmark datasets demonstrate that GBT outperforms SOTA TSFTs (FEDformer, Pyraformer, ETSformer, etc.) and many other forecasting models (SCINet, N-HiTS, etc.) with only canonical attention and convolution while owning less time and space complexity. It is also general enough to couple with these models to strengthen their forecasting capability. The source code is available at: https://github.com/OrigamiSL/GBT
翻译:本文表明,时间序列预测Transformer(TSFT)因未知解码器输入的不恰当初始化方法而存在严重的过拟合问题,尤其在处理非平稳时间序列时更为突出。基于此观察,我们提出GBT——一种具备“良好开端”的新型两阶段Transformer框架。该框架将TSFT的预测过程解耦为自回归阶段和自回归阶段两个阶段,以解决输入序列与预测序列之间统计特性差异的问题。自回归阶段的预测结果作为“良好开端”,即对自回归阶段输入进行更优初始化。我们还提出误差评分修正模块以进一步强化GBT中自回归阶段的预测能力。在七个基准数据集上的大量实验表明,GBT在仅使用标准注意力机制和卷积的情况下,不仅时间与空间复杂度更低,且性能优于最先进的TSFT模型(如FEDformer、Pyraformer、ETSformer等)及多种其他预测模型(如SCINet、N-HiTS等)。该框架具有足够通用性,可与这些模型耦合以增强其预测能力。源代码已开源:https://github.com/OrigamiSL/GBT