The Implicit Bias of Adam and Muon on Smooth Homogeneous Neural Networks

We study the implicit bias of momentum-based optimizers on homogeneous models. We first extend existing results on the implicit bias of steepest descent in homogeneous models to normalized steepest descent with an optional learning rate schedule. We then show that for smooth homogeneous models, momentum steepest descent algorithms like Muon (spectral norm), MomentumGD ($\ell_2$ norm), and Signum ($\ell_\infty$ norm) are approximate steepest descent trajectories under a decaying learning rate schedule, proving that these algorithms too have a bias towards KKT points of the corresponding margin maximization problem. We extend the analysis to Adam (without the stability constant), which maximizes the $\ell_\infty$ margin, and to Muon-Signum and Muon-Adam, which maximize a hybrid norm. Our experiments corroborate the theory and show that the identity of the margin maximized depends on the choice of optimizer. Overall, our results extend earlier lines of work on steepest descent in homogeneous models and momentum-based optimizers in linear models.

翻译：我们研究了基于动量的优化器在齐次模型中的隐式偏差。首先，我们将齐次模型中关于最速下降法隐式偏差的现有结果推广至具有可选学习率调度的归一化最速下降法。随后，我们证明对于光滑齐次模型，在衰减学习率调度下，动量最速下降算法（如Muon（谱范数）、MomentumGD（$\ell_2$范数）和Signum（$\ell_\infty$范数））近似于最速下降轨迹，从而证实这些算法同样对相应间隔最大化问题的KKT点具有偏好性。我们将分析进一步扩展至Adam（不含稳定性常数），该算法最大化$\ell_\infty$间隔；以及Muon-Signum和Muon-Adam，它们最大化混合范数间隔。实验验证了理论结果，并表明所最大化的间隔特性取决于优化器的选择。总体而言，我们的研究拓展了先前关于齐次模型中最速下降法以及线性模型中基于动量的优化器的工作脉络。

相关内容

最速下降

关注 0

最速下降法又称为梯度法，是1847 年由著名数学家Cauchy 给出的，它是解析法中最古老的一种，其他解析方法或是它的变形，或是受它的启发而得到的，因此它是最优化方法的基础。作为一种基本的算法，他在最优化方法中占有重要地位。其优点是工作量少，存储变量较少，初始点要求不高;缺点是收敛慢，效率不高，有时达不到最优解。非线性规划研究的对象是非线性函数的数值最优化问题。它的理论和方法渗透到许多方面，特别是在军事、经济、管理、生产过程自动化、工程设计和产品优化设计等方面都有着重要的应用。而最速下降法正是n元函数的无约束非线性规划问题min f (x)的一种重要解析法，研究最速下降法原理及其算法实现对我们有着极其重要的意义

【普林斯顿博士论文】深度学习优化的隐性偏差：数学考察，391页pdf

专知会员服务

29+阅读 · 2024年10月4日

《基于因子图模型的非视线定位鲁棒误差估计》美国空军技术学院2022最新27页论文

专知会员服务

14+阅读 · 2022年10月18日

【Nature Machine Intelligence】机器学习模型能否克服有偏置的数据集？哈佛、MIT专家为你解读

专知会员服务

31+阅读 · 2022年3月11日

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

专知会员服务

35+阅读 · 2020年4月15日