Permutation symmetries of deep networks make basic operations like model merging and similarity estimation challenging. In many cases, aligning the weights of the networks, i.e., finding optimal permutations between their weights, is necessary. Unfortunately, weight alignment is an NP-hard problem. Prior research has mainly focused on solving relaxed versions of the alignment problem, leading to either time-consuming methods or sub-optimal solutions. To accelerate the alignment process and improve its quality, we propose a novel framework aimed at learning to solve the weight alignment problem, which we name Deep-Align. To that end, we first prove that weight alignment adheres to two fundamental symmetries and then, propose a deep architecture that respects these symmetries. Notably, our framework does not require any labeled data. We provide a theoretical analysis of our approach and evaluate Deep-Align on several types of network architectures and learning setups. Our experimental results indicate that a feed-forward pass with Deep-Align produces better or equivalent alignments compared to those produced by current optimization algorithms. Additionally, our alignments can be used as an effective initialization for other methods, leading to improved solutions with a significant speedup in convergence.
翻译:深度网络的置换对称性使得模型合并与相似性估计等基本操作充满挑战。在许多情况下,需要对网络权重进行对齐,即寻找权重之间的最优置换。然而,权重对齐是一个NP-hard问题。现有研究主要聚焦于对齐问题的松弛版本,导致要么需要耗时的方法,要么得到次优解。为加速对齐过程并提升质量,我们提出名为Deep-Align的新框架,旨在学习解决权重对齐问题。为此,我们首先证明权重对齐满足两种基本对称性,随后提出能尊重这些对称性的深度架构。值得注意的是,我们的框架无需任何标注数据。我们对方法进行理论分析,并在多种网络架构与学习设置下评估Deep-Align。实验结果表明,相较于现有优化算法生成的解,Deep-Align的前馈过程能产生更优或等效的对齐结果。此外,我们的对齐结果可作为其他方法的有效初始化,从而以显著加速的收敛性获得更优解。