Lion (Evolved Sign Momentum), a new optimizer discovered through program search, has shown promising results in training large AI models. It performs comparably or favorably to AdamW but with greater memory efficiency. As we can expect from the results of a random search program, Lion incorporates elements from several existing algorithms, including signed momentum, decoupled weight decay, Polak, and Nesterov momentum, but does not fit into any existing category of theoretically grounded optimizers. Thus, even though Lion appears to perform well as a general-purpose optimizer for a wide range of tasks, its theoretical basis remains uncertain. This lack of theoretical clarity limits opportunities to further enhance and expand Lion's efficacy. This work aims to demystify Lion. Based on both continuous-time and discrete-time analysis, we demonstrate that Lion is a theoretically novel and principled approach for minimizing a general loss function $f(x)$ while enforcing a bound constraint $\|x\|_\infty \leq 1/\lambda$. Lion achieves this through the incorporation of decoupled weight decay, where $\lambda$ represents the weight decay coefficient. Our analysis is made possible by the development of a new Lyapunov function for the Lion updates. It applies to a broader family of Lion-$\kappa$ algorithms, where the $\text{sign}(\cdot)$ operator in Lion is replaced by the subgradient of a convex function $\kappa$, leading to the solution of a general composite optimization problem of $\min_x f(x) + \kappa^*(x)$. Our findings provide valuable insights into the dynamics of Lion and pave the way for further improvements and extensions of Lion-related algorithms.
翻译:狮(Evolved Sign Momentum)是一种通过程序搜索发现的新型优化器,在训练大规模人工智能模型方面展现出显著成效。其性能与AdamW相当或更优,且具有更高的内存效率。正如随机搜索程序的预期结果,狮融合了多种现有算法的要素,包括符号动量、解耦权重衰减、Polak和Nesterov动量,但无法归入任何现有理论优化器的分类。因此,尽管狮作为通用优化器在广泛任务中表现优异,其理论基础仍不明确。这种理论上的模糊性限制了进一步改进和扩展狮效能的机会。本文旨在揭示狮的奥秘。基于连续时间和离散时间分析,我们证明狮是一种在最小化一般损失函数$f(x)$的同时施加边界约束$\|x\|_\infty \leq 1/\lambda$的理论新颖且原理严谨的方法。狮通过引入解耦权重衰减实现这一目标,其中$\lambda$表示权重衰减系数。我们的分析得益于为狮更新规则开发的一种新型李雅普诺夫函数。该分析适用于更广泛的狮-$\kappa$算法族,其中狮中的$\text{sign}(\cdot)$算子被凸函数$\kappa$的次梯度替代,从而求解一般复合优化问题$\min_x f(x) + \kappa^*(x)$。我们的研究结果为理解狮的动态机制提供了宝贵见解,并为进一步改进和扩展狮相关算法铺平了道路。