Appropriate weight initialization settings, along with the ReLU activation function, have been a cornerstone of modern deep learning, making it possible to train and deploy highly effective and efficient neural network models across diverse artificial intelligence. The problem of dying ReLU, where ReLU neurons become inactive and yield zero output, presents a significant challenge in the training of deep neural networks with ReLU activation function. Theoretical research and various methods have been introduced to address the problem. However, even with these methods and research, training remains challenging for extremely deep and narrow feedforward networks with ReLU activation function. In this paper, we propose a new weight initialization method to address this issue. We prove the properties of the proposed initial weight matrix and demonstrate how these properties facilitate the effective propagation of signal vectors. Through a series of experiments and comparisons with existing methods, we demonstrate the effectiveness of the new initialization method.
翻译:合适的权重初始化设置,配合ReLU激活函数,已成为现代深度学习的基石,使得能够在各类人工智能应用中训练并部署高效能的神经网络模型。然而,ReLU神经元失效(即产生零输出)的“死亡ReLU”问题,对采用ReLU激活函数的深度神经网络训练构成了重大挑战。尽管已有理论研究和多种方法被提出以解决该问题,但在使用ReLU激活函数的极深窄型前馈网络中,训练仍面临困难。本文提出了一种新的权重初始化方法来解决此问题。我们证明了所提出的初始权重矩阵的性质,并展示了这些性质如何促进信号向量的有效传播。通过一系列实验及与现有方法的对比,我们验证了该新初始化方法的有效性。