We provide a unified framework, applicable to a general family of convex losses and across binary and multiclass settings in the overparameterized regime, to approximately characterize the implicit bias of gradient descent in closed form. Specifically, we show that the implicit bias is approximated (but not exactly equal to) the minimum-norm interpolation in high dimensions, which arises from training on the squared loss. In contrast to prior work which was tailored to exponentially-tailed losses and used the intermediate support-vector-machine formulation, our framework directly builds on the primal-dual analysis of Ji and Telgarsky (2021), allowing us to provide new approximate equivalences for general convex losses through a novel sensitivity analysis. Our framework also recovers existing exact equivalence results for exponentially-tailed losses across binary and multiclass settings. Finally, we provide evidence for the tightness of our techniques, which we use to demonstrate the effect of certain loss functions designed for out-of-distribution problems on the closed-form solution.
翻译:我们提供了一个统一框架,适用于过参数化机制下的一类广泛凸损失函数以及二分类和多分类场景,以近似刻画梯度下降的隐式偏差的闭式解。具体而言,我们证明了该隐式偏差在高维情形下近似(但不完全等于)最小范数插值——后者源于平方损失训练。与先前针对指数尾损失并借助中间支持向量机公式的研究不同,我们的框架直接基于Ji和Telgarsky(2021)的原-对偶分析,通过新颖的灵敏度分析为一般凸损失函数提供了新的近似等价关系。该框架还能恢复二分类和多分类场景下指数尾损失已有的精确等价结果。最后,我们为分析技术的紧致性提供了证据,并以此展示了某些为分布外问题设计的损失函数对闭式解的影响。