Recently, various normalization layers have been proposed to stabilize the training of deep neural networks. Among them, group normalization is a generalization of layer normalization and instance normalization by allowing a degree of freedom in the number of groups it uses. However, to determine the optimal number of groups, trial-and-error-based hyperparameter tuning is required, and such experiments are time-consuming. In this study, we discuss a reasonable method for setting the number of groups. First, we find that the number of groups influences the gradient behavior of the group normalization layer. Based on this observation, we derive the ideal number of groups, which calibrates the gradient scale to facilitate gradient descent optimization. Our proposed number of groups is theoretically grounded, architecture-aware, and can provide a proper value in a layer-wise manner for all layers. The proposed method exhibited improved performance over existing methods in numerous neural network architectures, tasks, and datasets.
翻译:最近,各类归一化层被提出以稳定深度神经网络的训练过程。其中,组归一化通过允许使用组数的自由度,成为层归一化和实例归一化的泛化形式。然而,确定最优组数需要基于试错法的超参数调优,此类实验耗时严重。本研究探讨了设置组数的合理方法。首先,我们发现组数会影响组归一化层的梯度行为。基于这一观察,我们推导出可校准梯度尺度以促进梯度下降优化的理想组数。所提出的组数具有理论依据,能够感知网络架构,并以逐层方式为所有层提供合理取值。该方法在多种神经网络架构、任务及数据集上均展现出优于现有方法的性能。