Models, pre-trained on a similar or diverse source data set, have become pivotal in enhancing the efficiency and accuracy of time series forecasting on target data sets by leveraging transfer learning. While benchmarks validate the performance of model generalization on various target data sets, there is no structured research providing similarity and diversity measures explaining which characteristics of source and target data lead to transfer learning success. Our study pioneers in systematically evaluating the impact of source-target similarity and source diversity on zero-shot and fine-tuned forecasting outcomes in terms of accuracy, bias, and uncertainty estimation. We investigate these dynamics using pre-trained neural networks across five public source datasets, applied in forecasting five target data sets, including real-world wholesales data. We identify two feature-based similarity and diversity measures showing: Source-target similarity enhances forecasting accuracy and reduces bias, while source diversity enhances forecasting accuracy and uncertainty estimation and increases the bias.
翻译:通过利用迁移学习,在相似或多样化的源数据集上预训练的模型已成为提升目标数据集时间序列预测效率与准确性的关键工具。尽管基准测试验证了模型在不同目标数据集上的泛化性能,但目前尚缺乏结构化的研究提供相似性与多样性度量指标,以解释源数据与目标数据的哪些特征决定了迁移学习的成功。本研究首次系统评估了源-目标相似性与源多样性对零样本和微调预测结果在准确性、偏差及不确定性估计方面的影响。我们利用在五个公开源数据集上预训练的神经网络,针对五个目标数据集(包括真实世界批发数据)进行预测并探究这些动态特征。我们识别出两种基于特征的相似性与多样性度量指标,结果表明:源-目标相似性能够提升预测准确性并降低偏差,而源多样性则能提升预测准确性与不确定性估计,但会增加偏差。