The Transformer is a highly successful deep learning model that has revolutionised the world of artificial neural networks, first in natural language processing and later in computer vision. This model is based on the attention mechanism and is able to capture complex semantic relationships between a variety of patterns present in the input data. Precisely because of these characteristics, the Transformer has recently been exploited for time series forecasting problems, assuming a natural adaptability to the domain of continuous numerical series. Despite the acclaimed results in the literature, some works have raised doubts about the robustness and effectiveness of this approach. In this paper, we further investigate the effectiveness of Transformer-based models applied to the domain of time series forecasting, demonstrate their limitations, and propose a set of alternative models that are better performing and significantly less complex. In particular, we empirically show how simplifying Transformer-based forecasting models almost always leads to an improvement, reaching state of the art performance. We also propose shallow models without the attention mechanism, which compete with the overall state of the art in long time series forecasting, and demonstrate their ability to accurately predict time series over extremely long windows. From a methodological perspective, we show how it is always necessary to use a simple baseline to verify the effectiveness of proposed models, and finally, we conclude the paper with a reflection on recent research paths and the opportunity to follow trends and hypes even where it may not be necessary.
翻译:Transformer是一种非常成功的深度学习模型,它首先在自然语言处理领域,随后在计算机视觉领域彻底改变了人工神经网络的世界。该模型基于注意力机制,能够捕捉输入数据中多种模式之间的复杂语义关系。正是由于这些特性,Transformer最近被应用于时间序列预测问题,假设其对连续数值序列领域具有天然的适应性。尽管文献中报道了令人瞩目的结果,但一些研究对该方法的鲁棒性和有效性提出了质疑。本文进一步研究了基于Transformer的模型在时间序列预测领域的有效性,展示了其局限性,并提出了一系列性能更优且显著简化结构的替代模型。特别是,我们通过实验证明,简化基于Transformer的预测模型几乎总能带来性能提升,达到最先进的水平。我们还提出了不依赖于注意力机制的浅层模型,与长期时间序列预测的整体最先进水平相媲美,并展示了其在极长窗口内准确预测时间序列的能力。从方法论的角度,我们展示了始终有必要使用简单的基线模型来验证所提出模型的有效性;最后,本文对近年来的研究方向以及在不必要的情况下追随潮流和热点的行为进行了反思。