We consider the generalization error associated with stochastic gradient descent on a smooth convex function over a compact set. We show the first bound on the generalization error that vanishes when the number of iterations $T$ and the dataset size $n$ go to zero at arbitrary rates; our bound scales as $\tilde{O}(1/\sqrt{T} + 1/\sqrt{n})$ with step-size $\alpha_t = 1/\sqrt{t}$. In particular, strong convexity is not needed for stochastic gradient descent to generalize well.
翻译:我们考虑紧集上光滑凸函数的随机梯度下降的泛化误差。我们首次证明了当迭代次数$T$和数据集规模$n$以任意速率趋近于零时泛化误差消失的界;在步长$\alpha_t = 1/\sqrt{t}$下,该界为$\tilde{O}(1/\sqrt{T} + 1/\sqrt{n})$。特别地,随机梯度下降的良好泛化性无需强凸性假设。