通过基于固定神经元锚点的训练时神经元对齐改进模型融合 (Improving Model Fusion by Training-time Neuron Alignment with Fixed Neuron Anchors)

Model fusion aims to integrate several deep neural network (DNN) models' knowledge into one by fusing parameters, and it has promising applications, such as improving the generalization of foundation models and parameter averaging in federated learning. However, models under different settings (data, hyperparameter, etc.) have diverse neuron permutations; in other words, from the perspective of loss landscape, they reside in different loss basins, thus hindering model fusion performances. To alleviate this issue, previous studies highlighted the role of permutation invariance and have developed methods to find correct network permutations for neuron alignment after training. Orthogonal to previous attempts, this paper studies training-time neuron alignment, improving model fusion without the need for post-matching. Training-time alignment is cheaper than post-alignment and is applicable in various model fusion scenarios. Starting from fundamental hypotheses and theorems, a simple yet lossless algorithm called TNA-PFN is introduced. TNA-PFN utilizes partially fixed neuron weights as anchors to reduce the potential of training-time permutations, and it is empirically validated in reducing the barriers of linear mode connectivity and multi-model fusion. It is also validated that TNA-PFN can improve the fusion of pretrained models under the setting of model soup (vision transformers) and ColD fusion (pretrained language models). Based on TNA-PFN, two federated learning methods, FedPFN and FedPNU, are proposed, showing the prospects of training-time neuron alignment. FedPFN and FedPNU reach state-of-the-art performances in federated learning under heterogeneous settings and can be compatible with the server-side algorithm.

翻译：模型融合旨在通过融合参数将多个深度神经网络（DNN）模型的知识整合到一个模型中，其在提升基础模型的泛化能力及联邦学习中的参数平均等应用方面具有广阔前景。然而，不同设置（数据、超参数等）下的模型具有不同的神经元排列；换言之，从损失景观的角度看，它们位于不同的损失盆地中，从而阻碍了模型融合的性能。为缓解此问题，先前研究强调了排列不变性的作用，并开发了在训练后寻找正确网络排列以实现神经元对齐的方法。与先前尝试正交，本文研究训练时神经元对齐，以改进模型融合而无需后匹配。训练时对齐比后对齐成本更低，且适用于多种模型融合场景。从基本假设和定理出发，本文提出了一种简单且无损的算法TNA-PFN。TNA-PFN利用部分固定的神经元权重作为锚点来降低训练时排列的可能性，并通过实证验证其在降低线性模式连通性障碍和多模型融合障碍方面的有效性。研究还验证了TNA-PFN可在模型汤（视觉Transformer）和ColD融合（预训练语言模型）设置下改进预训练模型的融合。基于TNA-PFN，本文提出了两种联邦学习方法FedPFN和FedPNU，展示了训练时神经元对齐的应用前景。FedPFN和FedPNU在异构设置下的联邦学习中达到了最先进的性能，并可兼容服务器端算法。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日