Deep learning approaches are increasingly used to tackle forecasting tasks. A key factor in the successful application of these methods is a large enough training sample size, which is not always available. In these scenarios, synthetic data generation techniques are usually applied to augment the dataset. Data augmentation is typically applied before fitting a model. However, these approaches create a single augmented dataset, potentially limiting their effectiveness. This work introduces OnDAT (On-the-fly Data Augmentation for Time series) to address this issue by applying data augmentation during training and validation. Contrary to traditional methods that create a single, static augmented dataset beforehand, OnDAT performs augmentation on-the-fly. By generating a new augmented dataset on each iteration, the model is exposed to a constantly changing augmented data variations. We hypothesize this process enables a better exploration of the data space, which reduces the potential for overfitting and improves forecasting performance. We validated the proposed approach using a state-of-the-art deep learning forecasting method and 8 benchmark datasets containing a total of 75797 time series. The experiments suggest that OnDAT leads to better forecasting performance than a strategy that applies data augmentation before training as well as a strategy that does not involve data augmentation. The method and experiments are publicly available.
翻译:深度学习技术正越来越多地被用于处理预测任务。这类方法成功应用的关键因素之一是足够大的训练样本量,而这并非总能满足。在此类场景中,通常采用合成数据生成技术来扩充数据集。数据增强通常在模型拟合前实施,但这类方法仅生成单一增强数据集,可能限制其有效性。本研究提出OnDAT(面向时序数据的即时数据增强),通过在训练和验证阶段实时进行数据增强来解决该问题。与预先创建单一静态增强数据集的传统方法不同,OnDAT采用即时增强策略。通过在每次迭代中生成新的增强数据集,模型能够持续接触不断变化的数据增强变体。我们假设这一过程能更充分地探索数据空间,从而降低过拟合风险并提升预测性能。我们采用最先进的深度学习预测方法及包含75797条时序数据的8个基准数据集对所述方法进行验证。实验表明,相比训练前进行数据增强或无数据增强策略,OnDAT能实现更优的预测性能。该方法和实验数据已公开提供。