Modern deep learning optimization features heterogeneous parameter structures, noisy gradients, and highly nonconvex landscapes, posing significant challenges for both algorithm design and theoretical analysis. Motivated by the limitations of SGD and the success of adaptive optimizers, we propose {\it Schattor}, a family of adaptive first-order methods based on Schatten norms. Schattor unifies SGD and the recently proposed matrix-variate adaptive optimizer Muon within a single Schatten-norm-based framework. We establish dimension-free stationarity guarantees for methods in the Schattor family for stochastic matrix optimization problems via a novel matrix martingale moment bound. We also develop multi-block extensions that adaptively balance block-wise optimization progress and prove dimension-free stationarity guarantees in this more general setting.
翻译:现代深度学习优化面临参数结构异构、梯度噪声显著以及高度非凸的景观,这给算法设计与理论分析带来了重大挑战。受随机梯度下降(SGD)局限性以及自适应优化器成功经验的启发,我们提出{\it Schattor}——一类基于Schatten范数的自适应一阶方法。Schattor将SGD与近期提出的矩阵变量自适应优化器Muon统一于一个基于Schatten范数的框架中。通过一种新颖的矩阵鞅矩界,我们为Schattor族方法在随机矩阵优化问题中建立了无维数依赖的驻点收敛性保证。我们还开发了多块扩展方法,可自适应地平衡各块优化进程,并在这一更一般的设定下证明了无维数依赖的驻点收敛性保证。