Although the pre-training followed by fine-tuning paradigm is used extensively in many fields, there is still some controversy surrounding the impact of pre-training on the fine-tuning process. Currently, experimental findings based on text and image data lack consensus. To delve deeper into the unsupervised pre-training followed by fine-tuning paradigm, we have extended previous research to a new modality: time series. In this study, we conducted a thorough examination of 150 classification datasets derived from the Univariate Time Series (UTS) and Multivariate Time Series (MTS) benchmarks. Our analysis reveals several key conclusions. (i) Pre-training can only help improve the optimization process for models that fit the data poorly, rather than those that fit the data well. (ii) Pre-training does not exhibit the effect of regularization when given sufficient training time. (iii) Pre-training can only speed up convergence if the model has sufficient ability to fit the data. (iv) Adding more pre-training data does not improve generalization, but it can strengthen the advantage of pre-training on the original data volume, such as faster convergence. (v) While both the pre-training task and the model structure determine the effectiveness of the paradigm on a given dataset, the model structure plays a more significant role.
翻译:尽管预训练后微调范式在许多领域中被广泛应用,但关于预训练对微调过程的影响仍存在争议。当前基于文本和图像数据的实验结果尚未达成共识。为深入探究无监督预训练后微调这一范式,我们将先前研究拓展至新的模态:时间序列。本研究对源自单变量时间序列(UTS)与多变量时间序列(MTS)基准的150个分类数据集进行了全面分析。分析得出若干关键结论:(i)预训练仅能帮助改善对数据拟合较差的模型的优化过程,而对拟合良好的模型无效;(ii)在训练时间充足的情况下,预训练不呈现正则化效果;(iii)仅当模型具备充分的数据拟合能力时,预训练才能加速收敛;(iv)增加预训练数据量不会提升泛化能力,但能强化预训练对原始数据量的优势(如更快的收敛速度);(v)尽管预训练任务与模型结构共同决定了该范式在特定数据集上的有效性,但模型结构的作用更为显著。