As modern machine learning models continue to advance the computational frontier, it has become increasingly important to develop precise estimates for expected performance improvements under different model and data scaling regimes. Currently, theoretical understanding of the learning curves that characterize how the prediction error depends on the number of samples is restricted to either large-sample asymptotics ($m\to\infty$) or, for certain simple data distributions, to the high-dimensional asymptotics in which the number of samples scales linearly with the dimension ($m\propto d$). There is a wide gulf between these two regimes, including all higher-order scaling relations $m\propto d^r$, which are the subject of the present paper. We focus on the problem of kernel ridge regression for dot-product kernels and present precise formulas for the mean of the test error, bias, and variance, for data drawn uniformly from the sphere with isotropic random labels in the $r$th-order asymptotic scaling regime $m\to\infty$ with $m/d^r$ held constant. We observe a peak in the learning curve whenever $m \approx d^r/r!$ for any integer $r$, leading to multiple sample-wise descent and nontrivial behavior at multiple scales.
翻译:随着现代机器学习模型不断推进计算前沿,在不同模型和数据缩放机制下精确预估预期性能提升变得日益重要。当前,描述预测误差如何随样本数量变化的学习曲线的理论理解,要么局限于大样本渐近($m\to\infty$),要么针对某些简单数据分布,局限于样本数量与维度线性缩放的高维渐近($m\propto d$)。这两个机制之间存在巨大鸿沟,包括所有更高阶的缩放关系 $m\propto d^r$,这正是本文的研究主题。我们聚焦于点积核的核岭回归问题,并提出了测试误差均值、偏差和方差的精确公式,这些公式适用于在 $r$ 阶渐近缩放机制下($m\to\infty$,且 $m/d^r$ 保持恒定)从球面上均匀抽取且带有各向同性随机标签的数据。我们观察到,对于任意整数 $r$,当 $m \approx d^r/r!$ 时,学习曲线会出现峰值,从而导致多尺度下的多次样本量递减现象及非平凡行为。