Training deep neural networks (DNNs) is computationally expensive, which is problematic especially when performing duplicated or similar training runs in model ensemble or fine-tuning pre-trained models, for example. Once we have trained one DNN on some dataset, we have its learning trajectory (i.e., a sequence of intermediate parameters during training) which may potentially contain useful information for learning the dataset. However, there has been no attempt to utilize such information of a given learning trajectory for another training. In this paper, we formulate the problem of "transferring" a given learning trajectory from one initial parameter to another one (learning transfer problem) and derive the first algorithm to approximately solve it by matching gradients successively along the trajectory via permutation symmetry. We empirically show that the transferred parameters achieve non-trivial accuracy before any direct training, and can be trained significantly faster than training from scratch.
翻译:训练深度神经网络(DNN)计算成本高昂,尤其在模型集成或微调预训练模型等场景中执行重复或相似训练时问题凸显。当我们完成一个DNN在某数据集上的训练后,会获得其学习轨迹(即训练过程中的中间参数序列),该序列可能蕴含对学习该数据集有用的信息。然而,目前尚无研究尝试将给定学习轨迹中的此类信息用于其他训练过程。本文提出了将给定学习轨迹从一个初始参数"迁移"至另一初始参数的问题(学习迁移问题),并推导出首个通过置换对称性沿轨迹逐步匹配梯度来近似求解该问题的算法。实验表明,迁移后的参数在未经任何直接训练前即可达到可观精度,且其训练速度显著快于从零开始训练。