Implicit bias induced by gradient-based algorithms is essential to the generalization of overparameterized models, yet its mechanisms can be subtle. This work leverages the Normalized Steepest Descent} (NSD) framework to investigate how optimization geometry shapes solutions on multiclass separable data. We introduce NucGD, a geometry-aware optimizer designed to enforce low rank structures through nuclear norm constraints. Beyond the algorithm itself, we connect NucGD with emerging low-rank projection methods, providing a unified perspective. To enable scalable training, we derive an efficient SVD-free update rule via asynchronous power iteration. Furthermore, we empirically dissect the impact of stochastic optimization dynamics, characterizing how varying levels of gradient noise induced by mini-batch sampling and momentum modulate the convergence toward the expected maximum margin solutions.Our code is accessible at: https://github.com/Tsokarsic/observing-the-implicit-bias-on-multiclass-seperable-data.
翻译:梯度优化算法诱导的隐式偏差对于过参数化模型的泛化能力至关重要,但其内在机制仍较为隐晦。本文利用归一化最速下降(NSD)框架,系统探究优化几何结构如何塑造多分类可分数据上的解空间。我们提出几何感知优化器NucGD,通过核范数约束强制实现低秩结构。在算法创新之外,我们进一步建立了NucGD与新兴低秩投影方法之间的理论关联,形成统一分析视角。为实现可扩展训练,我们推导出基于异步幂迭代的高效免SVD更新规则。此外,通过实证剖析随机优化动力学的影响,我们揭示了由小批量采样和动量机制引入的不同梯度噪声水平如何调节预期最大间隔解的收敛特性。源代码访问地址:https://github.com/Tsokarsic/observing-the-implicit-bias-on-multiclass-seperable-data。