We examine the characteristic activation values of individual ReLU units in neural networks. We refer to the corresponding set for such characteristic activation values in the input space as the characteristic activation set of a ReLU unit. We draw an explicit connection between the characteristic activation set and learned features in ReLU networks. This connection leads to new insights into why various neural network normalization techniques used in modern deep learning architectures regularize and stabilize SGD optimization. Utilizing these insights, we propose a geometric approach to parameterize ReLU networks for improved feature learning. We empirically verify its usefulness with less carefully chosen initialization schemes and larger learning rates. We report improved optimization stability, faster convergence speed, and better generalization performance.
翻译:我们研究了神经网络中单个ReLU单元的神经特征激活值。我们将输入空间中对应于这些特征激活值的集合称为ReLU单元的特征激活集。我们明确建立了ReLU网络特征激活集与学习特征之间的联系。这一联系为现代深度学习架构中各类神经网络归一化技术在随机梯度下降优化中的正则化与稳定性机制提供了新见解。基于这些见解,我们提出了一种几何方法对ReLU网络进行参数化,以改进特征学习。通过使用较不精细选择的初始化方案和更大的学习率,我们实验验证了该方法的有效性。实验结果表明,该方法在优化稳定性、收敛速度及泛化性能方面均取得了显著提升。