The well-known empirical risk minimization (ERM) principle is the basis of many widely used machine learning algorithms, and plays an essential role in the classical PAC theory. A common description of a learning algorithm's performance is its so-called "learning curve", that is, the decay of the expected error as a function of the input sample size. As the PAC model fails to explain the behavior of learning curves, recent research has explored an alternative universal learning model and has ultimately revealed a distinction between optimal universal and uniform learning rates (Bousquet et al., 2021). However, a basic understanding of such differences with a particular focus on the ERM principle has yet to be developed. In this paper, we consider the problem of universal learning by ERM in the realizable case and study the possible universal rates. Our main result is a fundamental tetrachotomy: there are only four possible universal learning rates by ERM, namely, the learning curves of any concept class learnable by ERM decay either at $e^{-n}$, $1/n$, $\log(n)/n$, or arbitrarily slow rates. Moreover, we provide a complete characterization of which concept classes fall into each of these categories, via new complexity structures. We also develop new combinatorial dimensions which supply sharp asymptotically-valid constant factors for these rates, whenever possible.
翻译:经验风险最小化(ERM)原则是许多广泛使用的机器学习算法的基础,在经典PAC理论中扮演着核心角色。学习算法性能的一种常见描述是其所谓的“学习曲线”,即期望误差随输入样本量增加的衰减情况。由于PAC模型无法解释学习曲线的行为,近期研究探索了一种替代性的通用学习模型,并最终揭示了最优通用学习速率与一致学习速率之间的区别(Bousquet等人,2021)。然而,针对ERM原则的此类差异的基本理解仍有待建立。本文考虑可实现情形下ERM的通用学习问题,并研究可能的通用学习速率。我们的主要成果是一个基础的四分定理:ERM仅存在四种可能的通用学习速率,即可被ERM学习的任何概念类,其学习曲线要么以$e^{-n}$、$1/n$、$\log(n)/n$的速率衰减,要么以任意慢的速率衰减。此外,我们通过新的复杂度结构完整刻画了哪些概念类属于上述各个类别。我们还发展了新的组合维度,在可能的情况下为这些速率提供了精确的渐近有效常数因子。