Lion (Evolved Sign Momentum), a new optimizer discovered through program search, has shown promising results in training large AI models. It performs comparably or favorably to AdamW but with greater memory efficiency. As we can expect from the results of a random search program, Lion incorporates elements from several existing algorithms, including signed momentum, decoupled weight decay, Polak, and Nesterov momentum, but does not fit into any existing category of theoretically grounded optimizers. Thus, even though Lion appears to perform well as a general-purpose optimizer for a wide range of tasks, its theoretical basis remains uncertain. This lack of theoretical clarity limits opportunities to further enhance and expand Lion's efficacy. This work aims to demystify Lion. Based on both continuous-time and discrete-time analysis, we demonstrate that Lion is a theoretically novel and principled approach for minimizing a general loss function $f(x)$ while enforcing a bound constraint $\|x\|_\infty \leq 1/\lambda$. Lion achieves this through the incorporation of decoupled weight decay, where $\lambda$ represents the weight decay coefficient. Our analysis is made possible by the development of a new Lyapunov function for the Lion updates. It applies to a broader family of Lion-$\kappa$ algorithms, where the $\text{sign}(\cdot)$ operator in Lion is replaced by the subgradient of a convex function $\kappa$, leading to the solution of a general composite optimization problem of $\min_x f(x) + \kappa^*(x)$. Our findings provide valuable insights into the dynamics of Lion and pave the way for further improvements and extensions of Lion-related algorithms.
翻译:Lion(进化符号动量)是一种通过程序搜索发现的新型优化器,在训练大型AI模型方面展现出良好前景。其性能与AdamW相当或更优,同时具有更高的内存效率。正如随机搜索程序的结果所预期的那样,Lion融合了多种现有算法的元素,包括符号动量、解耦权重衰减、Polak动量和Nesterov动量,但无法归类于任何现有理论优化器范畴。因此,尽管Lion作为通用优化器在广泛任务中表现良好,其理论基础仍不明确。这种理论清晰度的缺乏限制了进一步提升和扩展Lion效能的机会。本研究旨在揭示Lion的理论本质。基于连续时间和离散时间分析,我们证明Lion是一种理论新颖且原理完备的方法,用于最小化一般损失函数$f(x)$的同时强制执行边界约束$\|x\|_\infty \leq 1/\lambda$。Lion通过融入解耦权重衰减实现这一目标,其中$\lambda$表示权重衰减系数。我们的分析得益于为Lion更新设计的新型李雅普诺夫函数。该分析适用于更广泛的Lion-$\kappa$算法族,其中Lion中的$\text{sign}(\cdot)$算子被替换为凸函数$\kappa$的次梯度,从而求解$\min_x f(x) + \kappa^*(x)$的一般复合优化问题。我们的研究结果为理解Lion的动态特性提供了宝贵见解,并为Lion相关算法的进一步改进和扩展铺平了道路。