We investigate Stochastic Mirror Descent (SMD) with matrix parameters and vector-valued predictions, a framework relevant to multi-class classification and matrix completion problems. Focusing on the overparameterized regime, where the total number of parameters exceeds the number of training samples, we prove that SMD with matrix mirror functions $ψ(\cdot)$ converges exponentially to a global interpolator. Furthermore, we generalize classical implicit bias results of vector SMD by demonstrating that the matrix SMD algorithm converges to the unique solution minimizing the Bregman divergence induced by $ψ(\cdot)$ from initialization subject to interpolating the data. These findings reveal how matrix mirror maps dictate inductive bias in high-dimensional, multi-output problems.
翻译:本文研究了具有矩阵参数和向量值预测的随机镜像下降(SMD)方法,该框架与多类分类和矩阵补全问题密切相关。聚焦于过参数化机制(即参数总数超过训练样本数的情况),我们证明了采用矩阵镜像函数 $ψ(\cdot)$ 的 SMD 能以指数速度收敛至全局插值解。此外,我们推广了向量 SMD 的经典隐式偏差结论,证明了矩阵 SMD 算法在满足数据插值约束的条件下,会收敛到由 $ψ(\cdot)$ 诱导的 Bregman 散度最小化的唯一解(该最小化过程以初始化点为基准)。这些发现揭示了矩阵镜像映射如何在高维多输出问题中决定归纳偏差。