This paper studies the problem of learning an unknown function $f$ from given data about $f$. The learning problem is to give an approximation $\hat f$ to $f$ that predicts the values of $f$ away from the data. There are numerous settings for this learning problem depending on (i) what additional information we have about $f$ (known as a model class assumption), (ii) how we measure the accuracy of how well $\hat f$ predicts $f$, (iii) what is known about the data and data sites, (iv) whether the data observations are polluted by noise. A mathematical description of the optimal performance possible (the smallest possible error of recovery) is known in the presence of a model class assumption. Under standard model class assumptions, it is shown in this paper that a near optimal $\hat f$ can be found by solving a certain discrete over-parameterized optimization problem with a penalty term. Here, near optimal means that the error is bounded by a fixed constant times the optimal error. This explains the advantage of over-parameterization which is commonly used in modern machine learning. The main results of this paper prove that over-parameterized learning with an appropriate loss function gives a near optimal approximation $\hat f$ of the function $f$ from which the data is collected. Quantitative bounds are given for how much over-parameterization needs to be employed and how the penalization needs to be scaled in order to guarantee a near optimal recovery of $f$. An extension of these results to the case where the data is polluted by additive deterministic noise is also given.
翻译:本文研究从关于函数$f$的给定数据中学习未知函数$f$的问题。学习任务旨在给出$f$的一个近似$\hat f$,使其能够预测$f$在数据以外的值。该学习问题有多种设定,具体取决于:(i) 我们对$f$拥有哪些额外信息(称为模型类假设),(ii) 如何衡量$\hat f$预测$f$的准确性,(iii) 关于数据及数据点的已知信息,(iv) 数据观测是否受到噪声污染。在存在模型类假设的情况下,最优可能性能(即恢复误差的最小可能值)的数学描述已有研究。本文证明,在标准模型类假设下,通过求解一个带有惩罚项的特定离散过参数化优化问题,可以找到接近最优的$\hat f$。此处“接近最优”指误差被限定在最优误差的固定常数倍以内。这解释了现代机器学习中常用的过参数化优势。本文的主要结果证明,使用适当损失函数的过参数化学习能够从收集数据的函数$f$中给出接近最优的近似$\hat f$。为保障对$f$的接近最优恢复,文中给出了过参数化所需程度及惩罚项缩放方式的定量界限。此外,还将这些结果推广至数据受加性确定性噪声污染的情形。