The Transformer is a highly successful deep learning model that has revolutionised the world of artificial neural networks, first in natural language processing and later in computer vision. This model is based on the attention mechanism and is able to capture complex semantic relationships between a variety of patterns present in the input data. Precisely because of these characteristics, the Transformer has recently been exploited for time series forecasting problems, assuming a natural adaptability to the domain of continuous numerical series. Despite the acclaimed results in the literature, some works have raised doubts about the robustness and effectiveness of this approach. In this paper, we further investigate the effectiveness of Transformer-based models applied to the domain of time series forecasting, demonstrate their limitations, and propose a set of alternative models that are better performing and significantly less complex. In particular, we empirically show how simplifying Transformer-based forecasting models almost always leads to an improvement, reaching state of the art performance. We also propose shallow models without the attention mechanism, which compete with the overall state of the art in long time series forecasting, and demonstrate their ability to accurately predict time series over extremely long windows. From a methodological perspective, we show how it is always necessary to use a simple baseline to verify the effectiveness of proposed models, and finally, we conclude the paper with a reflection on recent research paths and the opportunity to follow trends and hypes even where it may not be necessary.
翻译:Transformer 是一种极为成功的深度学习模型,它彻底改变了人工神经网络领域,最初在自然语言处理中,随后在计算机视觉中。该模型基于注意力机制,能够捕捉输入数据中各种模式之间复杂的语义关系。正是由于这些特性,Transformer 近年来被应用于时间序列预测问题,假设其能自然适应连续数值序列领域。尽管文献中取得了备受赞誉的结果,一些研究对这一方法的鲁棒性和有效性提出了质疑。在本文中,我们进一步研究了基于 Transformer 的模型在时间序列预测领域中的有效性,展示了它们的局限性,并提出了一组性能更优且复杂度显著更低的替代模型。具体来说,我们通过实验证明,简化基于 Transformer 的预测模型几乎总能在性能上带来提升,达到当前最优水平。我们还提出了没有注意力机制的浅层模型,这些模型在长期时间序列预测中与当前整体最优水平相竞争,并展示了它们能在极长窗口内准确预测时间序列的能力。从方法论的角度出发,我们展示了始终需要使用简单基线来验证所提出模型有效性的必要性,最后,我们以对近期研究路径以及在不必要情况下追随趋势和热潮的反思作为总结。