Sharpness-aware minimization (SAM) was proposed to reduce sharpness of minima and has been shown to enhance generalization performance in various settings. In this work we show that perturbing only the affine normalization parameters (comprising less than 0.1% of the total parameters) in the adversarial step of SAM outperforms perturbing all of the parameters. This finding generalizes to different SAM variants and both ResNet (Batch Normalization) and Vision Transformer (Layer Normalization) architectures. We consider alternative sparse perturbation approaches and find that these do not achieve similar performance enhancement at such extreme sparsity levels, showing that this behaviour is unique to the normalization layers. Although our findings reaffirm the effectiveness of SAM in improving generalization performance, they cast doubt on whether this is solely caused by reduced sharpness. The code for our experiments is publicly available at https://github.com/mueller-mp/SAM-ON.
翻译:锐度感知最小化(SAM)旨在降低极值点的锐度,并已被证明能在多种场景下提升泛化性能。本研究表明,在SAM的对抗步骤中,仅扰动仿射归一化参数(占总参数比例不足0.1%)的效果优于扰动全部参数。该发现可推广至不同的SAM变体,以及ResNet(批归一化)和Vision Transformer(层归一化)架构。我们尝试了其他稀疏扰动方法,发现这些方法在极稀疏水平下无法达到类似的性能提升效果,表明该特性是归一化层所特有的。尽管本工作再次证实了SAM在提升泛化性能方面的有效性,但也对其是否完全源于锐度降低提出了质疑。实验代码已公开在https://github.com/mueller-mp/SAM-ON。