Model fusion aims to integrate several deep neural network (DNN) models' knowledge into one by fusing parameters, and it has promising applications, such as improving the generalization of foundation models and parameter averaging in federated learning. However, models under different settings (data, hyperparameter, etc.) have diverse neuron permutations; in other words, from the perspective of loss landscape, they reside in different loss basins, thus hindering model fusion performances. To alleviate this issue, previous studies highlighted the role of permutation invariance and have developed methods to find correct network permutations for neuron alignment after training. Orthogonal to previous attempts, this paper studies training-time neuron alignment, improving model fusion without the need for post-matching. Training-time alignment is cheaper than post-alignment and is applicable in various model fusion scenarios. Starting from fundamental hypotheses and theorems, a simple yet lossless algorithm called TNA-PFN is introduced. TNA-PFN utilizes partially fixed neuron weights as anchors to reduce the potential of training-time permutations, and it is empirically validated in reducing the barriers of linear mode connectivity and multi-model fusion. It is also validated that TNA-PFN can improve the fusion of pretrained models under the setting of model soup (vision transformers) and ColD fusion (pretrained language models). Based on TNA-PFN, two federated learning methods, FedPFN and FedPNU, are proposed, showing the prospects of training-time neuron alignment. FedPFN and FedPNU reach state-of-the-art performances in federated learning under heterogeneous settings and can be compatible with the server-side algorithm.
翻译:模型融合旨在通过融合参数将多个深度神经网络(DNN)模型的知识整合到一个模型中,其在提升基础模型的泛化能力及联邦学习中的参数平均等应用方面具有广阔前景。然而,不同设置(数据、超参数等)下的模型具有不同的神经元排列;换言之,从损失景观的角度看,它们位于不同的损失盆地中,从而阻碍了模型融合的性能。为缓解此问题,先前研究强调了排列不变性的作用,并开发了在训练后寻找正确网络排列以实现神经元对齐的方法。与先前尝试正交,本文研究训练时神经元对齐,以改进模型融合而无需后匹配。训练时对齐比后对齐成本更低,且适用于多种模型融合场景。从基本假设和定理出发,本文提出了一种简单且无损的算法TNA-PFN。TNA-PFN利用部分固定的神经元权重作为锚点来降低训练时排列的可能性,并通过实证验证其在降低线性模式连通性障碍和多模型融合障碍方面的有效性。研究还验证了TNA-PFN可在模型汤(视觉Transformer)和ColD融合(预训练语言模型)设置下改进预训练模型的融合。基于TNA-PFN,本文提出了两种联邦学习方法FedPFN和FedPNU,展示了训练时神经元对齐的应用前景。FedPFN和FedPNU在异构设置下的联邦学习中达到了最先进的性能,并可兼容服务器端算法。