Tight Risk Bounds for Gradient Descent on Separable Data

We study the generalization properties of unregularized gradient methods applied to separable linear classification -- a setting that has received considerable attention since the pioneering work of Soudry et al. (2018). We establish tight upper and lower (population) risk bounds for gradient descent in this setting, for any smooth loss function, expressed in terms of its tail decay rate. Our bounds take the form $\Theta(r_{\ell,T}^2 / \gamma^2 T + r_{\ell,T}^2 / \gamma^2 n)$, where $T$ is the number of gradient steps, $n$ is size of the training set, $\gamma$ is the data margin, and $r_{\ell,T}$ is a complexity term that depends on the (tail decay rate) of the loss function (and on $T$). Our upper bound matches the best known upper bounds due to Shamir (2021); Schliserman and Koren (2022), while extending their applicability to virtually any smooth loss function and relaxing technical assumptions they impose. Our risk lower bounds are the first in this context and establish the tightness of our upper bounds for any given tail decay rate and in all parameter regimes. The proof technique used to show these results is also markedly simpler compared to previous work, and is straightforward to extend to other gradient methods; we illustrate this by providing analogous results for Stochastic Gradient Descent.

翻译：我们研究了无正则化梯度方法在可分离线性分类中的泛化性质——这一设定自Soudry等人(2018)的开创性工作以来受到广泛关注。针对该设定中的梯度下降方法，我们建立了任意光滑损失函数下（人口）风险的上界和下界紧致界限，这些界限用损失函数的尾部衰减率表示。我们的界限形如$\Theta(r_{\ell,T}^2 / \gamma^2 T + r_{\ell,T}^2 / \gamma^2 n)$，其中$T$为梯度步数，$n$为训练集规模，$\gamma$为数据间隔，$r_{\ell,T}$为依赖于损失函数（尾部衰减率）及$T$的复杂度项。我们的上界匹配了Shamir (2021)与Schliserman和Koren (2022)已知的最佳上界，同时将其适用范围扩展至几乎任意光滑损失函数，并放宽了他们施加的技术假设。本文提出的风险下界是该领域的首个结果，证明了给定任意尾部衰减率及所有参数区间下我们上界的紧致性。证明方法相较先前工作显著简化，且易于推广至其他梯度方法：我们通过为随机梯度下降提供类似结果展示了这一点。

相关内容

损失函数（机器学习）

关注 10

损失函数，在AI中亦称呼距离函数，度量函数。此处的距离代表的是抽象性的，代表真实数据与预测数据之间的误差。损失函数（loss function）是用来估量你模型的预测值f(x)与真实值Y的不一致程度，它是一个非负实值函数,通常使用L(Y, f(x))来表示，损失函数越小，模型的鲁棒性就越好。损失函数是经验风险函数的核心部分，也是结构风险函数重要组成部分。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

52+阅读 · 2020年12月14日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日