Deep learning models, particularly Transformers, have achieved impressive results in various domains, including time series forecasting. While existing time series literature primarily focuses on model architecture modifications and data augmentation techniques, this paper explores the training schema of deep learning models for time series; how models are trained regardless of their architecture. We perform extensive experiments to investigate the occurrence of deep double descent in several Transformer models trained on public time series data sets. We demonstrate epoch-wise deep double descent and that overfitting can be reverted using more epochs. Leveraging these findings, we achieve state-of-the-art results for long sequence time series forecasting in nearly 70% of the 72 benchmarks tested. This suggests that many models in the literature may possess untapped potential. Additionally, we introduce a taxonomy for classifying training schema modifications, covering data augmentation, model inputs, model targets, time series per model, and computational budget.
翻译:深度学习模型,尤其是Transformer,已在包括时间序列预测在内的多个领域取得了显著成果。现有时间序列文献主要关注模型架构改进和数据增强技术,而本文则探索了深度学习模型在时间序列任务中的训练范式——即无论模型架构如何,其训练方式本身所产生的影响。我们通过大量实验,研究了在多个基于公共时间序列数据集训练的Transformer模型中,深度双下降现象的发生情况。我们展示了基于迭代次数的深度双下降,并证明过拟合可以通过增加训练轮次得到逆转。基于这些发现,我们在测试的72个基准中近70%的长序列时间序列预测任务上达到了当前最优性能。这表明文献中的许多模型可能仍具有尚未开发的潜力。此外,我们提出了一种用于分类训练范式修改方法的分类型体系,涵盖数据增强、模型输入、模型目标、每模型处理的时间序列数量以及计算预算等方面。