In this article, we explore the use of various matrix norms for optimizing functions of weight matrices, a crucial problem in deep learning. Moving beyond the spectral norm that underlies the Muon update, we leverage the duals of the Ky Fan norms to introduce the Fanion family of linear minimization oracle (LMO) algorithms, which are closely related to Muon, $ν$-SAM, and Dion. Staying inside the LMO, we construct the families of F-Fanions and S-Fanions, whose updates are convex combinations of the updates of Fanions and Normalized SGD or SignSGD, respectively. The most promising algorithms in these families are F-Muon and S-Muon. By conducting an extensive empirical study of all three algorithm families across a wide range of tasks and settings, we demonstrate that F-Muon and S-Muon consistently match Muon's performance, while outperforming Muon on a synthetic smooth convex problem.
翻译:暂无翻译