Recent studies on deep neural networks show that flat minima of the loss landscape correlate with improved generalization. Sharpness-aware minimization (SAM) efficiently finds flat regions by updating the parameters according to the gradient at an adversarial perturbation. The perturbation depends on the Euclidean metric, making SAM non-invariant under reparametrizations, which blurs sharpness and generalization. We propose Monge SAM (M-SAM), a reparametrization invariant version of SAM by considering a Riemannian metric in the parameter space induced naturally by the loss surface. Compared to previous approaches, M-SAM works under any modeling choice, relies only on mild assumptions while being as computationally efficient as SAM. We theoretically argue that M-SAM varies between SAM and gradient descent (GD), which increases robustness to hyperparameter selection and reduces attraction to suboptimal equilibria like saddle points. We demonstrate this behavior both theoretically and empirically on a multi-modal representation alignment task.
翻译:深度神经网络的最新研究表明,损失景观的平坦极小值与改善的泛化性能相关。锐度感知最小化(SAM)通过根据对抗扰动处的梯度更新参数,高效地找到平坦区域。该扰动依赖于欧几里得度量,使得SAM在重参数化下不具有不变性,从而模糊了锐度与泛化之间的关系。我们提出Monge SAM(M-SAM),一种重参数化不变的SAM版本,其通过在参数空间中自然地由损失曲面诱导的黎曼度量来实现。与先前方法相比,M-SAM适用于任何建模选择,仅依赖于温和的假设,同时计算效率与SAM相当。我们从理论上论证,M-SAM在SAM和梯度下降(GD)之间变化,这增强了对超参数选择的鲁棒性,并减少了对鞍点等次优平衡点的吸引。我们在一个多模态表示对齐任务上,从理论和实证两方面验证了这种行为。