Recent advancements in learning algorithms have demonstrated that the sharpness of the loss surface is an effective measure for improving the generalization gap. Building upon this concept, Sharpness-Aware Minimization (SAM) was proposed to enhance model generalization and achieved state-of-the-art performance. SAM consists of two main steps, the weight perturbation step and the weight updating step. However, the perturbation in SAM is determined by only the gradient of the training loss, or cross-entropy loss. As the model approaches a stationary point, this gradient becomes small and oscillates, leading to inconsistent perturbation directions and also has a chance of diminishing the gradient. Our research introduces an innovative approach to further enhancing model generalization. We propose the Adaptive Adversarial Cross-Entropy (AACE) loss function to replace standard cross-entropy loss for SAM's perturbation. AACE loss and its gradient uniquely increase as the model nears convergence, ensuring consistent perturbation direction and addressing the gradient diminishing issue. Additionally, a novel perturbation-generating function utilizing AACE loss without normalization is proposed, enhancing the model's exploratory capabilities in near-optimum stages. Empirical testing confirms the effectiveness of AACE, with experiments demonstrating improved performance in image classification tasks using Wide ResNet and PyramidNet across various datasets. The reproduction code is available online
翻译:近期学习算法的进展表明,损失曲面的锐度是改善泛化差距的有效度量。基于这一概念,锐度感知最小化(SAM)被提出以增强模型泛化能力,并取得了最先进的性能。SAM包含两个主要步骤:权重扰动步骤和权重更新步骤。然而,SAM中的扰动仅由训练损失(即交叉熵损失)的梯度决定。当模型接近平稳点时,该梯度会变小并发生振荡,导致扰动方向不一致,并可能使梯度衰减。本研究提出了一种创新方法以进一步增强模型泛化能力。我们提出自适应对抗交叉熵(AACE)损失函数,以替代SAM扰动中使用的标准交叉熵损失。AACE损失及其梯度在模型接近收敛时独特地增大,确保了扰动方向的一致性,并解决了梯度衰减问题。此外,本文提出了一种利用未归一化AACE损失的新型扰动生成函数,增强了模型在接近最优阶段的探索能力。实证测试证实了AACE的有效性,实验表明使用Wide ResNet和PyramidNet在多种数据集上的图像分类任务中性能均得到提升。复现代码已在线公开。