Flatness of the loss landscape has been widely studied as an important perspective for understanding the behavior and generalization of deep learning algorithms. Motivated by this view, we propose Flatness-Aware Stochastic Gradient Langevin Dynamics (fSGLD), a first-order optimization method that biases learning its dynamics toward flat basins while retaining the computational and memory efficiency of SGD and SGLD. We provide a non-asymptotic theoretical analysis showing that fSGLD converges to a flatness-biased Gibbs distribution under a theoretically prescribed coupling between the noise scale $σ$ and the inverse temperature $β$, together with explicit excess risk guarantees. We empirically evaluate fSGLD across standard optimizer benchmarks, Bayesian image classification, uncertainty quantification, and out-of-distribution detection, demonstrating consistently strong performance and reliable uncertainty estimates. Additional experiments confirm the effectiveness of the theoretically prescribed $β$-$σ$ coupling compared to decoupled choices.
翻译:损失函数景观的平坦性已被广泛研究为理解深度学习算法行为与泛化能力的重要视角。受此观点启发,我们提出平坦性感知的随机梯度朗之万动力学(fSGLD),这是一种一阶优化方法,其学习动态偏向平坦区域,同时保持了SGD和SGLD的计算与内存效率。我们提供了非渐近理论分析,证明在理论设定的噪声尺度$σ$与逆温度$β$的耦合条件下,fSGLD收敛于平坦性偏置的吉布斯分布,并给出了显式的超额风险保证。我们在标准优化器基准测试、贝叶斯图像分类、不确定性量化以及分布外检测任务中对fSGLD进行了实证评估,结果表明该方法具有持续优异的性能和可靠的不确定性估计。补充实验证实了理论设定的$β$-$σ$耦合方案相较于解耦选择的优越性。