Asymptotics of Random Feature Regression Beyond the Linear Scaling Regime

Recent advances in machine learning have been achieved by using overparametrized models trained until near interpolation of the training data. It was shown, e.g., through the double descent phenomenon, that the number of parameters is a poor proxy for the model complexity and generalization capabilities. This leaves open the question of understanding the impact of parametrization on the performance of these models. How does model complexity and generalization depend on the number of parameters $p$? How should we choose $p$ relative to the sample size $n$ to achieve optimal test error? In this paper, we investigate the example of random feature ridge regression (RFRR). This model can be seen either as a finite-rank approximation to kernel ridge regression (KRR), or as a simplified model for neural networks trained in the so-called lazy regime. We consider covariates uniformly distributed on the $d$-dimensional sphere and compute sharp asymptotics for the RFRR test error in the high-dimensional polynomial scaling, where $p,n,d \to \infty$ while $p/ d^{\kappa_1}$ and $n / d^{\kappa_2}$ stay constant, for all $\kappa_1 , \kappa_2 \in \mathbb{R}_{>0}$. These asymptotics precisely characterize the impact of the number of random features and regularization parameter on the test performance. In particular, RFRR exhibits an intuitive trade-off between approximation and generalization power. For $n = o(p)$, the sample size $n$ is the bottleneck and RFRR achieves the same performance as KRR (which is equivalent to taking $p = \infty$). On the other hand, if $p = o(n)$, the number of random features $p$ is the limiting factor and RFRR test error matches the approximation error of the random feature model class (akin to taking $n = \infty$). Finally, a double descent appears at $n= p$, a phenomenon that was previously only characterized in the linear scaling $\kappa_1 = \kappa_2 = 1$.

翻译：近期机器学习进展依赖于使用过参数化模型训练至近乎插值训练数据。通过例如双重下降现象表明，参数数量是衡量模型复杂度与泛化能力的欠佳指标。这留下了一个开放性问题：参数化如何影响这些模型的性能？模型复杂度与泛化能力如何依赖于参数数量$p$？应如何选择相对于样本量$n$的$p$以实现最优测试误差？本文以随机特征岭回归（RFRR）为例展开研究。该模型可视为核岭回归（KRR）的有限秩近似，或被视为在所谓惰性机制下训练的神经网络的简化模型。我们考虑协变量均匀分布在$d$维球面上，并在高维多项式标度下计算RFRR测试误差的精确渐近性，其中$p,n,d \to \infty$且对所有$\kappa_1 , \kappa_2 \in \mathbb{R}_{>0}$，$p/ d^{\kappa_1}$和$n / d^{\kappa_2}$保持常数。这些渐近性质精确刻画了随机特征数量与正则化参数对测试性能的影响。特别地，RFRR在近似能力与泛化能力之间展现出直观的权衡。当$n = o(p)$时，样本量$n$成为瓶颈，RFRR实现与KRR（等价于取$p = \infty$）相同的性能。另一方面，若$p = o(n)$，随机特征数量$p$成为限制因素，RFRR测试误差与随机特征模型类的近似误差（类似于取$n = \infty$）一致。最后，在$n= p$处出现双重下降现象，该现象先前仅在线性标度$\kappa_1 = \kappa_2 = 1$下被刻画。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/