Permutation symmetries of deep networks make basic operations like model merging and similarity estimation challenging. In many cases, aligning the weights of the networks, i.e., finding optimal permutations between their weights, is necessary. Unfortunately, weight alignment is an NP-hard problem. Prior research has mainly focused on solving relaxed versions of the alignment problem, leading to either time-consuming methods or sub-optimal solutions. To accelerate the alignment process and improve its quality, we propose a novel framework aimed at learning to solve the weight alignment problem, which we name Deep-Align. To that end, we first prove that weight alignment adheres to two fundamental symmetries and then, propose a deep architecture that respects these symmetries. Notably, our framework does not require any labeled data. We provide a theoretical analysis of our approach and evaluate Deep-Align on several types of network architectures and learning setups. Our experimental results indicate that a feed-forward pass with Deep-Align produces better or equivalent alignments compared to those produced by current optimization algorithms. Additionally, our alignments can be used as an effective initialization for other methods, leading to improved solutions with a significant speedup in convergence.
翻译:深度网络的置换对称性使得模型融合和相似性估计等基本操作具有挑战性。在许多情况下,对齐网络的权重(即寻找其权重之间的最优置换)是必要的。遗憾的是,权重对齐是一个NP难问题。先前的研究主要集中于解决对齐问题的松弛版本,导致要么方法耗时,要么解次优。为了加速对齐过程并提升其质量,我们提出了一个旨在学习解决权重对齐问题的新框架,并将其命名为Deep-Align。为此,我们首先证明了权重对齐遵循两种基本对称性,进而提出了一种尊重这些对称性的深度架构。值得注意的是,我们的框架不需要任何标注数据。我们提供了该方法的理论分析,并在多种网络架构和学习设置上评估了Deep-Align。实验结果表明,与当前优化算法产生的对齐结果相比,Deep-Align的前向传播能产生更好或等效的对齐。此外,我们的对齐结果可作为其他方法的有效初始化,从而以显著加快的收敛速度获得改进的解。