Many algorithms and observed phenomena in deep learning appear to be affected by parameter symmetries -- transformations of neural network parameters that do not change the underlying neural network function. These include linear mode connectivity, model merging, Bayesian neural network inference, metanetworks, and several other characteristics of optimization or loss-landscapes. However, theoretical analysis of the relationship between parameter space symmetries and these phenomena is difficult. In this work, we empirically investigate the impact of neural parameter symmetries by introducing new neural network architectures that have reduced parameter space symmetries. We develop two methods, with some provable guarantees, of modifying standard neural networks to reduce parameter space symmetries. With these new methods, we conduct a comprehensive experimental study consisting of multiple tasks aimed at assessing the effect of removing parameter symmetries. Our experiments reveal several interesting observations on the empirical impact of parameter symmetries; for instance, we observe linear mode connectivity between our networks without alignment of weight spaces, and we find that our networks allow for faster and more effective Bayesian neural network training.
翻译:深度学习中的许多算法和观测现象似乎受到参数对称性的影响——即那些改变神经网络参数但不改变底层网络函数的变换。这些包括线性模式连通性、模型融合、贝叶斯神经网络推理、元网络以及优化或损失景观的其他若干特性。然而,参数空间对称性与这些现象之间关系的理论分析十分困难。在本工作中,我们通过引入具有减少参数空间对称性的新型神经网络架构,对神经网络参数对称性的影响进行实证研究。我们开发了两种修改标准神经网络以减少参数空间对称性的方法,并提供了部分可证明的保证。利用这些新方法,我们开展了一项全面的实验研究,包含多个旨在评估消除参数对称性效果的任务。我们的实验揭示了关于参数对称性经验影响的若干有趣发现;例如,我们观察到无需权重空间对齐的情况下我们的网络之间仍存在线性模式连通性,并且发现我们的网络能够实现更快、更有效的贝叶斯神经网络训练。