We examine the characteristic activation values of individual ReLU units in neural networks. We refer to the corresponding set for such characteristic activation values in the input space as the characteristic activation set of a ReLU unit. We draw an explicit connection between the characteristic activation set and learned features in ReLU networks. This connection leads to new insights into why various neural network normalization techniques used in modern deep learning architectures regularize and stabilize SGD optimization. Utilizing these insights, we propose a geometric approach to parameterize ReLU networks for improved feature learning. We empirically verify its usefulness with less carefully chosen initialization schemes and larger learning rates. We report improved optimization stability, faster convergence speed, and better generalization performance.
翻译:我们研究了神经网络中单个ReLU单元的特征激活值,并将输入空间中此类特征激活值对应的集合称为ReLU单元的特征激活集。我们明确建立了特征激活集与ReLU网络学习特征之间的关联。这一关联为现代深度学习架构中各类神经网络归一化技术正则化并稳定SGD优化过程提供了新见解。基于这些见解,我们提出了一种几何方法来参数化ReLU网络,以增强特征学习。通过采用较简化的初始化方案和更大的学习率,我们实证验证了该方法的有效性。实验表明,该方法在优化稳定性、收敛速度和泛化性能上均取得显著提升。