Training deep neural networks (DNNs) is computationally expensive, which is problematic especially when performing duplicated training runs, such as model ensemble or knowledge distillation. Once we have trained one DNN on some dataset, we have its learning trajectory (i.e., a sequence of intermediate parameters during training) which may potentially contain useful information for learning the dataset. However, there has been no attempt to utilize such information of a given learning trajectory for another training. In this paper, we formulate the problem of "transferring" a given learning trajectory from one initial parameter to another one, called learning transfer problem, and derive the first algorithm to approximately solve it by matching gradients successively along the trajectory via permutation symmetry. We empirically show that the transferred parameters achieve non-trivial accuracy before any direct training. Also, we analyze the loss landscape property of the transferred parameters, especially from a viewpoint of mode connectivity.
翻译:训练深度神经网络(DNN)计算成本高昂,这在执行重复训练时(例如模型集成或知识蒸馏)尤其成问题。一旦我们在某个数据集上训练了一个DNN,就会获得其学习轨迹(即训练过程中的中间参数序列),该轨迹可能包含对学习数据集有用的信息。然而,目前尚无研究尝试将给定学习轨迹中的此类信息用于其他训练。在本文中,我们提出了将给定学习轨迹从一个初始参数“转移”到另一个初始参数的问题,称为学习转移问题,并推导出第一个通过排列对称性沿轨迹依次匹配梯度来近似求解该问题的算法。我们通过实验证明,转移后的参数在未经任何直接训练的情况下就实现了显著的准确率。此外,我们从模式连接性的角度分析了转移参数的损失景观性质。