Permutation symmetries of deep networks make simple operations like model averaging and similarity estimation challenging. In many cases, aligning the weights of the networks, i.e., finding optimal permutations between their weights, is necessary. More generally, weight alignment is essential for a wide range of applications, from model merging, through exploring the optimization landscape of deep neural networks, to defining meaningful distance functions between neural networks. Unfortunately, weight alignment is an NP-hard problem. Prior research has mainly focused on solving relaxed versions of the alignment problem, leading to either time-consuming methods or sub-optimal solutions. To accelerate the alignment process and improve its quality, we propose a novel framework aimed at learning to solve the weight alignment problem, which we name Deep-Align. To that end, we first demonstrate that weight alignment adheres to two fundamental symmetries and then, propose a deep architecture that respects these symmetries. Notably, our framework does not require any labeled data. We provide a theoretical analysis of our approach and evaluate Deep-Align on several types of network architectures and learning setups. Our experimental results indicate that a feed-forward pass with Deep-Align produces better or equivalent alignments compared to those produced by current optimization algorithms. Additionally, our alignments can be used as an initialization for other methods to gain even better solutions with a significant speedup in convergence.
翻译:深度网络的排列对称性使得模型平均和相似性估计等简单操作变得具有挑战性。在许多情况下,对齐网络权重(即寻找权重之间的最优排列)是必要的。更一般地,权重对齐对于从模型合并、探索深度神经网络优化景观到定义神经网络之间有意义的距离函数等广泛应用至关重要。不幸的是,权重对齐是一个NP难问题。先前的研究主要集中于解决对齐问题的松弛版本,导致要么耗时较长,要么得到次优解。为加速对齐过程并提高其质量,我们提出了一种旨在学习解决权重对齐问题的新框架,名为Deep-Align。为此,我们首先证明权重对齐遵循两种基本对称性,然后提出一种尊重这些对称性的深度架构。值得注意的是,我们的框架不需要任何标记数据。我们提供了方法的理论分析,并在多种网络架构和学习设置上评估了Deep-Align。实验结果表明,使用Deep-Align的前馈推理产生的对齐结果优于或等同于当前优化算法产生的对齐结果。此外,我们的对齐结果可作为其他方法的初始化,以获得更优解并显著加速收敛。