Sharpness-aware minimization (SAM) was proposed to reduce sharpness of minima and has been shown to enhance generalization performance in various settings. In this work we show that perturbing only the affine normalization parameters (typically comprising 0.1% of the total parameters) in the adversarial step of SAM can outperform perturbing all of the parameters.This finding generalizes to different SAM variants and both ResNet (Batch Normalization) and Vision Transformer (Layer Normalization) architectures. We consider alternative sparse perturbation approaches and find that these do not achieve similar performance enhancement at such extreme sparsity levels, showing that this behaviour is unique to the normalization layers. Although our findings reaffirm the effectiveness of SAM in improving generalization performance, they cast doubt on whether this is solely caused by reduced sharpness.
翻译:锐度感知最小化(Sharpness-Aware Minimization, SAM)旨在降低最小值的锐度,并已被证明能在多种场景下提升泛化性能。本研究显示,在SAM的对抗步骤中仅扰动仿射归一化参数(通常占总参数的0.1%)即可胜过扰动全部参数的效果。该结论适用于不同的SAM变体,以及ResNet(批归一化)和Vision Transformer(层归一化)两种架构。我们考察了其他稀疏扰动方法,发现它们在如此极端的稀疏水平下无法达到类似的性能提升,表明这一行为对归一化层具有独特性。尽管本研究的发现再次确认了SAM在提升泛化性能方面的有效性,但也对锐度降低是否为唯一原因提出了质疑。