The Muon optimizer has recently attracted attention due to its orthogonalized first-order updates, and a deeper theoretical understanding of its convergence behavior is essential for guiding practical applications; however, existing convergence guarantees are either coarse or obtained under restrictive analytical settings. In this work, we establish sharper convergence guarantees for the Muon optimizer through a direct and simplified analysis that does not rely on restrictive assumptions on the update rule. Our results improve upon existing bounds by achieving faster convergence rates while covering a broader class of problem settings. These findings provide a more accurate theoretical characterization of Muon and offer insights applicable to a broader class of orthogonalized first-order methods.
翻译:Muon优化器因其正交化的一阶更新机制而受到关注,对其收敛行为的深入理论理解对指导实际应用至关重要;然而,现有的收敛保证要么较为粗略,要么是在限制性分析设定下获得的。在本工作中,我们通过一种直接且简化的分析,在不依赖于更新规则限制性假设的前提下,为Muon优化器建立了更精确的收敛保证。我们的结果通过实现更快的收敛率并覆盖更广泛的问题设定类别,改进了现有界限。这些发现为Muon提供了更准确的理论刻画,并为更广泛类别的正交化一阶方法提供了可借鉴的见解。