Transformers have demonstrated impressive strength in long-term series forecasting. Existing prediction research mostly focused on mapping past short sub-series (lookback window) to future series (forecast window). The longer training dataset time series will be discarded, once training is completed. Models can merely rely on lookback window information for inference, which impedes models from analyzing time series from a global perspective. And these windows used by Transformers are quite narrow because they must model each time-step therein. Under this point-wise processing style, broadening windows will rapidly exhaust their model capacity. This, for fine-grained time series, leads to a bottleneck in information input and prediction output, which is mortal to long-term series forecasting. To overcome the barrier, we propose a brand-new methodology to utilize Transformer for time series forecasting. Specifically, we split time series into patches by day and reform point-wise to patch-wise processing, which considerably enhances the information input and output of Transformers. To further help models leverage the whole training set's global information during inference, we distill the information, store it in time representations, and replace series with time representations as the main modeling entities. Our designed time-modeling Transformer -- Dateformer yields state-of-the-art accuracy on 7 real-world datasets with a 33.6\% relative improvement and extends the maximum forecast range to half-year.
翻译:Transformer在长期序列预测中展现出显著优势。现有预测研究主要关注将过去短子序列(回溯窗口)映射至未来序列(预测窗口)。训练完成后,较长的训练数据集时间序列将被丢弃,模型仅能依赖回溯窗口信息进行推理,这阻碍了模型从全局视角分析时间序列。此外,Transformer使用的窗口相当狭窄,因其需对窗口内每个时间步进行建模。在这种逐点处理范式下,扩大窗口将迅速耗尽模型容量。对于细粒度时间序列而言,这会导致信息输入与预测输出的瓶颈,对长期序列预测具有致命影响。为突破这一障碍,我们提出了一种利用Transformer进行时间序列预测的全新方法论。具体而言,我们将时间序列按日划分为补丁,并将逐点处理重构为逐补丁处理,显著增强了Transformer的信息输入与输出能力。为进一步帮助模型在推理阶段利用整个训练集的全局信息,我们提取信息并将其存储于时间表征中,以时间表征替代序列作为主要建模实体。我们设计的时间建模Transformer——Dateformer在7个真实世界数据集上取得了当前最优精度,相对提升达33.6%,并将最大预测范围扩展至半年。