Exploring the loss landscape offers insights into the inherent principles of deep neural networks (DNNs). Recent work suggests an additional asymmetry of the valley beyond the flat and sharp ones, yet without thoroughly examining its causes or implications. Our study methodically explores the factors affecting the symmetry of DNN valleys, encompassing (1) the dataset, network architecture, initialization, and hyperparameters that influence the convergence point; and (2) the magnitude and direction of the noise for 1D visualization. Our major observation shows that the {\it degree of sign consistency} between the noise and the convergence point is a critical indicator of valley symmetry. Theoretical insights from the aspects of ReLU activation and softmax function could explain the interesting phenomenon. Our discovery propels novel understanding and applications in the scenario of Model Fusion: (1) the efficacy of interpolating separate models significantly correlates with their sign consistency ratio, and (2) imposing sign alignment during federated learning emerges as an innovative approach for model parameter alignment.
翻译:探索损失函数景观有助于深入理解深度神经网络的内在原理。近期研究指出,除了平坦与尖锐谷底之外,还存在一种额外的不对称谷底现象,但其成因与影响尚未得到系统检验。本研究系统性地探究了影响深度神经网络谷底对称性的因素,包括:(1)数据集、网络架构、初始化方式及超参数对收敛点的影响;(2)一维可视化中噪声的幅度与方向。我们的核心发现表明,噪声与收敛点之间的{\it 符号一致性程度}是衡量谷底对称性的关键指标。从ReLU激活函数与softmax函数角度的理论分析可以解释这一有趣现象。该发现推动了模型融合场景下的新认知与应用:(1)独立模型插值的效能与其符号一致率显著相关;(2)在联邦学习中施加符号对齐成为模型参数对齐的创新方法。