LARS and LAMB have emerged as prominent techniques in Large Batch Learning (LBL), ensuring the stability of AI training. One of the primary challenges in LBL is convergence stability, where the AI agent usually gets trapped into the sharp minimizer. Addressing this challenge, a relatively recent technique, known as warm-up, has been employed. However, warm-up lacks a strong theoretical foundation, leaving the door open for further exploration of more efficacious algorithms. In light of this situation, we conduct empirical experiments to analyze the behaviors of the two most popular optimizers in the LARS family: LARS and LAMB, with and without a warm-up strategy. Our analyses give us a comprehension of the novel LARS, LAMB, and the necessity of a warm-up technique in LBL. Building upon these insights, we propose a novel algorithm called Time Varying LARS (TVLARS), which facilitates robust training in the initial phase without the need for warm-up. Experimental evaluation demonstrates that TVLARS achieves competitive results with LARS and LAMB when warm-up is utilized while surpassing their performance without the warm-up technique.
翻译:LARS和LAMB已成为大批量学习(LBL)中的主流技术,能够确保人工智能训练的稳定性。LBL面临的主要挑战之一是收敛稳定性,即智能体通常容易陷入尖锐极小值。针对这一挑战,一种相对较新的技术——预热策略——被引入使用。然而,预热缺乏坚实的理论基础,这为进一步探索更有效的算法留下了空间。基于此,我们通过实证实验分析了LARS系列中两种最流行优化器(LARS和LAMB)在使用和不使用预热策略时的行为。我们的分析深化了对新型LARS、LAMB算法以及LBL中预热技术必要性的理解。基于这些见解,我们提出了一种名为时变LARS(TVLARS)的新算法,该算法在初始阶段无需预热即可实现稳健训练。实验评估表明,在使用预热策略时,TVLARS的性能与LARS和LAMB相当,而在不使用预热技术时,其性能则超越了二者。