Recent advances in machine learning have been achieved by using overparametrized models trained until near interpolation of the training data. It was shown, e.g., through the double descent phenomenon, that the number of parameters is a poor proxy for the model complexity and generalization capabilities. This leaves open the question of understanding the impact of parametrization on the performance of these models. How does model complexity and generalization depend on the number of parameters $p$? How should we choose $p$ relative to the sample size $n$ to achieve optimal test error? In this paper, we investigate the example of random feature ridge regression (RFRR). This model can be seen either as a finite-rank approximation to kernel ridge regression (KRR), or as a simplified model for neural networks trained in the so-called lazy regime. We consider covariates uniformly distributed on the $d$-dimensional sphere and compute sharp asymptotics for the RFRR test error in the high-dimensional polynomial scaling, where $p,n,d \to \infty$ while $p/ d^{\kappa_1}$ and $n / d^{\kappa_2}$ stay constant, for all $\kappa_1 , \kappa_2 \in \mathbb{R}_{>0}$. These asymptotics precisely characterize the impact of the number of random features and regularization parameter on the test performance. In particular, RFRR exhibits an intuitive trade-off between approximation and generalization power. For $n = o(p)$, the sample size $n$ is the bottleneck and RFRR achieves the same performance as KRR (which is equivalent to taking $p = \infty$). On the other hand, if $p = o(n)$, the number of random features $p$ is the limiting factor and RFRR test error matches the approximation error of the random feature model class (akin to taking $n = \infty$). Finally, a double descent appears at $n= p$, a phenomenon that was previously only characterized in the linear scaling $\kappa_1 = \kappa_2 = 1$.
翻译:近期机器学习进展依赖于使用过参数化模型训练至近乎插值训练数据。通过例如双重下降现象表明,参数数量是衡量模型复杂度与泛化能力的欠佳指标。这留下了一个开放性问题:参数化如何影响这些模型的性能?模型复杂度与泛化能力如何依赖于参数数量$p$?应如何选择相对于样本量$n$的$p$以实现最优测试误差?本文以随机特征岭回归(RFRR)为例展开研究。该模型可视为核岭回归(KRR)的有限秩近似,或被视为在所谓惰性机制下训练的神经网络的简化模型。我们考虑协变量均匀分布在$d$维球面上,并在高维多项式标度下计算RFRR测试误差的精确渐近性,其中$p,n,d \to \infty$且对所有$\kappa_1 , \kappa_2 \in \mathbb{R}_{>0}$,$p/ d^{\kappa_1}$和$n / d^{\kappa_2}$保持常数。这些渐近性质精确刻画了随机特征数量与正则化参数对测试性能的影响。特别地,RFRR在近似能力与泛化能力之间展现出直观的权衡。当$n = o(p)$时,样本量$n$成为瓶颈,RFRR实现与KRR(等价于取$p = \infty$)相同的性能。另一方面,若$p = o(n)$,随机特征数量$p$成为限制因素,RFRR测试误差与随机特征模型类的近似误差(类似于取$n = \infty$)一致。最后,在$n= p$处出现双重下降现象,该现象先前仅在线性标度$\kappa_1 = \kappa_2 = 1$下被刻画。