In this paper, we first identify activation shift, a simple but remarkable phenomenon in a neural network in which the preactivation value of a neuron has non-zero mean that depends on the angle between the weight vector of the neuron and the mean of the activation vector in the previous layer. We then propose linearly constrained weights (LCW) to reduce the activation shift in both fully connected and convolutional layers. The impact of reducing the activation shift in a neural network is studied from the perspective of how the variance of variables in the network changes through layer operations in both forward and backward chains. We also discuss its relationship to the vanishing gradient problem. Experimental results show that LCW enables a deep feedforward network with sigmoid activation functions to be trained efficiently by resolving the vanishing gradient problem. Moreover, combined with batch normalization, LCW improves generalization performance of both feedforward and convolutional networks.
翻译:本文首先识别了神经网络中一个简单但显著的现象——激活偏移,即神经元预激活值的非零均值取决于该神经元权重向量与上一层激活向量均值之间的夹角。随后,我们提出了线性约束权重(LCW),以减少全连接层和卷积层中的激活偏移。从变量方差在网络前向与反向传播链路中各层操作中的变化角度,研究了减少激活偏移对神经网络的影响。我们还探讨了其与梯度消失问题的关联。实验结果表明,LCW通过解决梯度消失问题,使得采用sigmoid激活函数的深层前馈网络能够高效训练。此外,结合批量归一化,LCW提升了前馈网络和卷积网络的泛化性能。