Deep neural networks achieve high performance across many domains but can still face challenges in generalization when optimization is influenced by small or noisy gradient components. Sharpness-Aware Minimization improves generalization by perturbing parameters toward directions of high curvature, but it uses the entire gradient vector, which means that small or noisy components may affect the ascent step and cause the optimizer to miss optimal solutions. We propose Z-Score Filtered Sharpness-Aware Minimization, which applies Z-score based filtering to gradients in each layer. Instead of using all gradient components, a mask is constructed to retain only the top percentile with the largest absolute Z-scores. The percentile threshold $Q_p$ determines how many components are kept, so that the ascent step focuses on directions that stand out most compared to the average of the layer. This selective perturbation refines the search toward flatter minima while reducing the influence of less significant gradients. Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet with architectures including ResNet, VGG, and Vision Transformers show that the proposed method consistently improves test accuracy compared to Sharpness-Aware Minimization and its variants. The code repository is available at: https://github.com/YUNBLAK/Sharpness-Aware-Minimization-with-Z-Score-Gradient-Filtering
翻译:深度神经网络在多个领域取得了高性能表现,但当优化过程受到微小或噪声梯度分量影响时,仍可能面临泛化挑战。锐度感知最小化通过向高曲率方向扰动参数来提升泛化能力,但其使用完整梯度向量,导致微小或噪声分量可能影响上升步并导致优化器错过最优解。我们提出基于Z-Score滤波的锐度感知最小化方法,该方法在每一层对梯度应用基于Z-Score的滤波处理。不同于使用所有梯度分量,我们构建一个掩码仅保留绝对值Z-Score最大的顶部百分位分量。百分位阈值$Q_p$决定了保留的分量数量,从而使上升步聚焦于与该层平均梯度相比最为突出的方向。这种选择性扰动机制在减少不显著梯度影响的同时,将搜索方向精炼至更平坦的极小值。在CIFAR-10、CIFAR-100及Tiny-ImageNet数据集上,采用包含ResNet、VGG和视觉Transformer在内的多种架构进行实验,结果表明所提方法相较于锐度感知最小化及其变体持续提升了测试准确率。代码仓库地址:https://github.com/YUNBLAK/Sharpness-Aware-Minimization-with-Z-Score-Gradient-Filtering