The Transformer is a highly successful deep learning model that has revolutionised the world of artificial neural networks, first in natural language processing and later in computer vision. This model is based on the attention mechanism and is able to capture complex semantic relationships between a variety of patterns present in the input data. Precisely because of these characteristics, the Transformer has recently been exploited for time series forecasting problems, assuming its natural adaptability to the domain of continuous numerical series. Despite the acclaimed results in the literature, some works have raised doubts about the robustness of this approach. In this paper, we further investigate the effectiveness of Transformer-based models applied to the domain of time series forecasting, demonstrate their limitations, and propose a set of alternative models that are better performing and significantly less complex. In particular, we empirically show how simplifying this forecasting model almost always leads to an improvement, reaching the state of the art among Transformer-based architectures. We also propose shallow models without the attention mechanism, which compete with the overall state of the art in long time series forecasting, and demonstrate their ability to accurately predict extremely long windows. We show how it is always necessary to use a simple baseline to verify the effectiveness of one's models, and finally we conclude the paper with a reflection on recent research paths and the desire to follow trends and apply the latest model even where it may not be necessary.
翻译:Transformer是一种在人工神经网络领域取得巨大成功的深度学习模型,最初在自然语言处理中崭露头角,随后在计算机视觉领域也大放异彩。该模型基于注意力机制,能够捕捉输入数据中多种模式之间的复杂语义关系。正是由于这些特性,Transformer近期被应用于时间序列预测问题,并假设其能自然适应连续数值序列领域。尽管文献中对其结果赞誉有加,但一些研究已对这种方法的稳健性提出质疑。本文进一步探究了基于Transformer的模型在时间序列预测领域的有效性,展示了其局限性,并提出了一系列性能更优且复杂度显著降低的替代模型。具体而言,我们通过实验证明,简化该预测模型几乎总能带来性能提升,使其在基于Transformer的架构中达到最优水平。我们还提出了无注意力机制的浅层模型,这些模型在长序列时间序列预测中能与整体最优水平相抗衡,并展示了它们准确预测极长窗口的能力。我们强调,始终有必要使用简单基线来验证模型的有效性。最后,我们以对近期研究路径的反思作为结论,探讨了追逐趋势、应用最新模型(即便可能并非必要)的倾向。