Enforcing orthonormal or isometric property for the weight matrices has been shown to enhance the training of deep neural networks by mitigating gradient exploding/vanishing and increasing the robustness of the learned networks. However, despite its practical performance, the theoretical analysis of orthonormality in neural networks is still lacking; for example, how orthonormality affects the convergence of the training process. In this letter, we aim to bridge this gap by providing convergence analysis for training orthonormal deep linear neural networks. Specifically, we show that Riemannian gradient descent with an appropriate initialization converges at a linear rate for training orthonormal deep linear neural networks with a class of loss functions. Unlike existing works that enforce orthonormal weight matrices for all the layers, our approach excludes this requirement for one layer, which is crucial to establish the convergence guarantee. Our results shed light on how increasing the number of hidden layers can impact the convergence speed. Experimental results validate our theoretical analysis.
翻译:对权重矩阵施加正交或等距约束已被证明能够通过缓解梯度爆炸/消失并增强学习网络的鲁棒性来促进深度神经网络的训练。然而,尽管其实际性能优异,神经网络中正交性的理论分析仍显不足;例如,正交性如何影响训练过程的收敛性。本文旨在通过提供正交深度线性神经网络训练的收敛性分析来填补这一空白。具体而言,我们证明在合适的初始化条件下,使用一类损失函数训练正交深度线性神经网络时,黎曼梯度下降算法以线性速率收敛。与现有工作中对所有层施加正交权重矩阵约束不同,我们的方法对某一层排除此要求,这一特性对于建立收敛性保证至关重要。我们的研究揭示了增加隐藏层数量如何影响收敛速度,实验结果验证了理论分析的正确性。