Inspired by the remarkable success of large neural networks, there has been significant interest in understanding the generalization performance of over-parameterized models. Substantial efforts have been invested in characterizing how optimization algorithms impact generalization through their "preferred" solutions, a phenomenon commonly referred to as implicit regularization. In particular, it has been argued that gradient descent (GD) induces an implicit $\ell_2$-norm regularization in regression and classification problems. However, the implicit regularization of different algorithms are confined to either a specific geometry or a particular class of learning problems, indicating a gap in a general approach for controlling the implicit regularization. To address this, we present a unified approach using mirror descent (MD), a notable generalization of GD, to control implicit regularization in both regression and classification settings. More specifically, we show that MD with the general class of homogeneous potential functions converges in direction to a generalized maximum-margin solution for linear classification problems, thereby answering a long-standing question in the classification setting. Further, we show that MD can be implemented efficiently and enjoys fast convergence under suitable conditions. Through comprehensive experiments, we demonstrate that MD is a versatile method to produce learned models with different regularizers, which in turn have different generalization performances.
翻译:受大型神经网络显著成功的启发,人们对理解过参数化模型的泛化性能产生了浓厚兴趣。大量研究致力于描述优化算法如何通过其"偏好"解影响泛化性能,这一现象通常被称为隐式正则化。特别地,已有研究表明梯度下降在回归和分类问题中会诱导隐式$\ell_2$范数正则化。然而,不同算法的隐式正则化要么局限于特定几何结构,要么局限于特定学习问题类别,表明在控制隐式正则化的通用方法上存在空白。为解决这一问题,我们提出了一种统一方法,利用镜像下降(梯度下降的重要推广)在回归和分类场景中控制隐式正则化。更具体地说,我们证明采用一般类齐次势函数的镜像下降在线性分类问题中会收敛到广义最大间隔解的方向,从而回答了分类场景中一个长期悬而未决的问题。此外,我们证明了镜像下降可以在适当条件下高效实现并具有快速收敛特性。通过全面的实验,我们展示了镜像下降是一种通用方法,能够生成具有不同正则化项的习得模型,而这些模型又具有不同的泛化性能。