This paper presents a Domain-Inspired Sharpness-Aware Minimization (DISAM) algorithm for optimization under domain shifts. It is motivated by the inconsistent convergence degree of SAM across different domains, which induces optimization bias towards certain domains and thus impairs the overall convergence. To address this issue, we consider the domain-level convergence consistency in the sharpness estimation to prevent the overwhelming (deficient) perturbations for less (well) optimized domains. Specifically, DISAM introduces the constraint of minimizing variance in the domain loss, which allows the elastic gradient calibration in perturbation generation: when one domain is optimized above the averaging level \textit{w.r.t.} loss, the gradient perturbation towards that domain will be weakened automatically, and vice versa. Under this mechanism, we theoretically show that DISAM can achieve faster overall convergence and improved generalization in principle when inconsistent convergence emerges. Extensive experiments on various domain generalization benchmarks show the superiority of DISAM over a range of state-of-the-art methods. Furthermore, we show the superior efficiency of DISAM in parameter-efficient fine-tuning combined with the pretraining models. The source code is released at https://github.com/MediaBrain-SJTU/DISAM.
翻译:本文提出了一种领域启发的锐度感知最小化(DISAM)算法,用于在域偏移下的优化问题。该算法的动机源于SAM在不同域间收敛程度的不一致性,这种不一致性会导致优化过程偏向某些特定域,从而损害整体收敛性能。为解决此问题,我们在锐度估计中考虑了域级收敛一致性,以防止对优化不足(或充分)的域施加过强(或过弱)的扰动。具体而言,DISAM引入了最小化域损失方差这一约束,从而在扰动生成中实现弹性梯度校准:当某个域的优化程度相对于损失高于平均水平时,指向该域的梯度扰动会自动减弱,反之亦然。在此机制下,我们从理论上证明,当出现不一致收敛时,DISAM原则上能够实现更快的整体收敛和更好的泛化性能。在多种域泛化基准测试上的大量实验表明,DISAM优于一系列最先进的方法。此外,我们还展示了DISAM与预训练模型结合进行参数高效微调时的卓越效率。源代码发布于 https://github.com/MediaBrain-SJTU/DISAM。