Two Steps Forward and One Behind: Rethinking Time Series Forecasting with Deep Learning

The Transformer is a highly successful deep learning model that has revolutionised the world of artificial neural networks, first in natural language processing and later in computer vision. This model is based on the attention mechanism and is able to capture complex semantic relationships between a variety of patterns present in the input data. Precisely because of these characteristics, the Transformer has recently been exploited for time series forecasting problems, assuming a natural adaptability to the domain of continuous numerical series. Despite the acclaimed results in the literature, some works have raised doubts about the robustness and effectiveness of this approach. In this paper, we further investigate the effectiveness of Transformer-based models applied to the domain of time series forecasting, demonstrate their limitations, and propose a set of alternative models that are better performing and significantly less complex. In particular, we empirically show how simplifying Transformer-based forecasting models almost always leads to an improvement, reaching state of the art performance. We also propose shallow models without the attention mechanism, which compete with the overall state of the art in long time series forecasting, and demonstrate their ability to accurately predict time series over extremely long windows. From a methodological perspective, we show how it is always necessary to use a simple baseline to verify the effectiveness of proposed models, and finally, we conclude the paper with a reflection on recent research paths and the opportunity to follow trends and hypes even where it may not be necessary.

翻译：Transformer是一种非常成功的深度学习模型，它首先在自然语言处理领域，随后在计算机视觉领域彻底改变了人工神经网络的世界。该模型基于注意力机制，能够捕捉输入数据中多种模式之间的复杂语义关系。正是由于这些特性，Transformer最近被应用于时间序列预测问题，假设其对连续数值序列领域具有天然的适应性。尽管文献中报道了令人瞩目的结果，但一些研究对该方法的鲁棒性和有效性提出了质疑。本文进一步研究了基于Transformer的模型在时间序列预测领域的有效性，展示了其局限性，并提出了一系列性能更优且显著简化结构的替代模型。特别是，我们通过实验证明，简化基于Transformer的预测模型几乎总能带来性能提升，达到最先进的水平。我们还提出了不依赖于注意力机制的浅层模型，与长期时间序列预测的整体最先进水平相媲美，并展示了其在极长窗口内准确预测时间序列的能力。从方法论的角度，我们展示了始终有必要使用简单的基线模型来验证所提出模型的有效性；最后，本文对近年来的研究方向以及在不必要的情况下追随潮流和热点的行为进行了反思。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【牛津大学】深度学习时间序列预测，12页pdf, Deep Learning Time Series Forecasting

专知会员服务

174+阅读 · 2020年5月1日

【牛津大学】深度学习时间序列预测，Time Series Forecasting With Deep Learning: A Survey

专知会员服务

142+阅读 · 2020年4月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日