This work examines the characteristic activation values of individual ReLU units in neural networks. We refer to the set of input locations corresponding to such characteristic activation values as the characteristic activation set of a ReLU unit. We draw an explicit connection between the characteristic activation set and learned features in ReLU networks. This connection leads to new insights into how various neural network normalization techniques used in modern deep learning architectures regularize and stabilize stochastic gradient optimization. Utilizing these insights, we propose geometric parameterization for ReLU networks to improve feature learning, which decouples the radial and angular parameters in the hyperspherical coordinate system. We empirically verify its usefulness with less carefully chosen initialization schemes and larger learning rates. We report significant improvements in optimization stability, convergence speed, and generalization performance for various models on a variety of datasets, including the ResNet-50 network on ImageNet.
翻译:本文研究了神经网络中单个ReLU单元的特征激活值。我们将与这些特征激活值对应的输入位置集合称为ReLU单元的特征激活集。我们揭示了特征激活集与ReLU网络学习特征之间的显式联系。这一联系为现代深度学习架构中使用的各类神经网络归一化技术如何规范化和稳定随机梯度优化提供了新见解。基于这些见解,我们提出了ReLU网络的几何参数化方法以改进特征学习,该方法在超球坐标系中解耦了径向参数和角度参数。我们通过无需精细选择的初始化方案和更大学习率实验验证了其有效性。我们报告了多种模型在各类数据集上的优化稳定性、收敛速度和泛化性能显著提升,包括ImageNet上的ResNet-50网络。