Stochastic convex optimization is one of the most well-studied models for learning in modern machine learning. Nevertheless, a central fundamental question in this setup remained unresolved: "How many data points must be observed so that any empirical risk minimizer (ERM) shows good performance on the true population?" This question was proposed by Feldman (2016), who proved that $\Omega(\frac{d}{\epsilon}+\frac{1}{\epsilon^2})$ data points are necessary (where $d$ is the dimension and $\epsilon>0$ is the accuracy parameter). Proving an $\omega(\frac{d}{\epsilon}+\frac{1}{\epsilon^2})$ lower bound was left as an open problem. In this work we show that in fact $\tilde{O}(\frac{d}{\epsilon}+\frac{1}{\epsilon^2})$ data points are also sufficient. This settles the question and yields a new separation between ERMs and uniform convergence. This sample complexity holds for the classical setup of learning bounded convex Lipschitz functions over the Euclidean unit ball. We further generalize the result and show that a similar upper bound holds for all symmetric convex bodies. The general bound is composed of two terms: (i) a term of the form $\tilde{O}(\frac{d}{\epsilon})$ with an inverse-linear dependence on the accuracy parameter, and (ii) a term that depends on the statistical complexity of the class of $\textit{linear}$ functions (captured by the Rademacher complexity). The proof builds a mechanism for controlling the behavior of stochastic convex optimization problems.
翻译:随机凸优化是现代机器学习中研究最充分的学习模型之一。然而,该设定下的一个核心基本问题仍未解决:"需要观测多少数据点,才能确保任意经验风险最小化器在真实总体中表现出良好性能?" 该问题由Feldman(2016)提出,他证明了$\Omega(\frac{d}{\epsilon}+\frac{1}{\epsilon^2})$个数据点是必要的(其中$d$为维度,$\epsilon>0$为精度参数)。而证明$\omega(\frac{d}{\epsilon}+\frac{1}{\epsilon^2})$的下界仍是一个开放问题。在本文中,我们证明实际上$\tilde{O}(\frac{d}{\epsilon}+\frac{1}{\epsilon^2})$个数据点也是充分的。这解决了该问题,并揭示了经验风险最小化与一致收敛之间的新分离。该样本复杂度适用于欧几里得单位球上有界凸Lipschitz函数学习的经典设定。我们进一步推广了该结果,证明所有对称凸体上均存在类似的上界。该广义上界由两项构成:(i)形式为$\tilde{O}(\frac{d}{\epsilon})$的项,其依赖于精度参数的逆线性关系;以及(ii)依赖于$\textit{线性}$函数类统计复杂度(由Rademacher复杂度刻画)的项。该证明构建了一种控制随机凸优化问题行为的机制。