Deep Double Descent for Time Series Forecasting: Avoiding Undertrained Models

Deep learning models, particularly Transformers, have achieved impressive results in various domains, including time series forecasting. While existing time series literature primarily focuses on model architecture modifications and data augmentation techniques, this paper explores the training schema of deep learning models for time series; how models are trained regardless of their architecture. We perform extensive experiments to investigate the occurrence of deep double descent in several Transformer models trained on public time series data sets. We demonstrate epoch-wise deep double descent and that overfitting can be reverted using more epochs. Leveraging these findings, we achieve state-of-the-art results for long sequence time series forecasting in nearly 70% of the 72 benchmarks tested. This suggests that many models in the literature may possess untapped potential. Additionally, we introduce a taxonomy for classifying training schema modifications, covering data augmentation, model inputs, model targets, time series per model, and computational budget.

翻译：深度学习模型，尤其是Transformer，已在包括时间序列预测在内的多个领域取得了显著成果。现有时间序列文献主要关注模型架构改进和数据增强技术，而本文则探索了深度学习模型在时间序列任务中的训练范式——即无论模型架构如何，其训练方式本身所产生的影响。我们通过大量实验，研究了在多个基于公共时间序列数据集训练的Transformer模型中，深度双下降现象的发生情况。我们展示了基于迭代次数的深度双下降，并证明过拟合可以通过增加训练轮次得到逆转。基于这些发现，我们在测试的72个基准中近70%的长序列时间序列预测任务上达到了当前最优性能。这表明文献中的许多模型可能仍具有尚未开发的潜力。此外，我们提出了一种用于分类训练范式修改方法的分类型体系，涵盖数据增强、模型输入、模型目标、每模型处理的时间序列数量以及计算预算等方面。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/