Lion (Evolved Sign Momentum), a new optimizer discovered through program search, has shown promising results in training large AI models. It performs comparably or favorably to AdamW but with greater memory efficiency. As we can expect from the results of a random search program, Lion incorporates elements from several existing algorithms, including signed momentum, decoupled weight decay, Polak, and Nesterov momentum, but does not fit into any existing category of theoretically grounded optimizers. Thus, even though Lion appears to perform well as a general-purpose optimizer for a wide range of tasks, its theoretical basis remains uncertain. This lack of theoretical clarity limits opportunities to further enhance and expand Lion's efficacy. This work aims to demystify Lion. Based on both continuous-time and discrete-time analysis, we demonstrate that Lion is a theoretically novel and principled approach for minimizing a general loss function $f(x)$ while enforcing a bound constraint $\|x\|_\infty \leq 1/\lambda$. Lion achieves this through the incorporation of decoupled weight decay, where $\lambda$ represents the weight decay coefficient. Our analysis is made possible by the development of a new Lyapunov function for the Lion updates. It applies to a broader family of Lion-$\kappa$ algorithms, where the $\text{sign}(\cdot)$ operator in Lion is replaced by the subgradient of a convex function $\kappa$, leading to the solution of a general composite optimization problem of $\min_x f(x) + \kappa^*(x)$. Our findings provide valuable insights into the dynamics of Lion and pave the way for further improvements and extensions of Lion-related algorithms.
翻译:Lion(进化符号动量)是一种通过程序搜索发现的新型优化器,在训练大规模AI模型中展现出良好前景。其性能与AdamW相当甚至更优,同时具有更高的内存效率。正如随机搜索程序的结果所示,Lion融合了多种现有算法的元素,包括符号动量、解耦权重衰减、Polak和Nesterov动量,但并不能归入任何现有理论优化器类别。因此,尽管Lion作为通用优化器在广泛任务中表现优异,其理论基础仍不明确。这种理论层面的不确定性限制了进一步提升和拓展Lion效能的可能性。本研究旨在揭示Lion的本质。基于连续时间和离散时间分析,我们证明Lion是一种在最小化一般损失函数$f(x)$的同时强制执行界约束$\|x\|_\infty \leq 1/\lambda$的理论创新且原则性强的优化方法。Lion通过引入解耦权重衰减实现这一目标,其中$\lambda$表示权重衰减系数。我们的分析得益于为Lion更新过程构建的新型李雅普诺夫函数。该方法适用于更广泛的Lion-$\kappa$算法族,其中Lion中的$\text{sign}(\cdot)$算子被替换为凸函数$\kappa$的次梯度,从而求解$\min_x f(x) + \kappa^*(x)$这一一般复合优化问题。我们的研究为理解Lion的动态特性提供了重要见解,并为进一步改进和扩展Lion相关算法铺平了道路。