We investigate the properties of random feature ridge regression (RFRR) given by a two-layer neural network with random Gaussian initialization. We study the non-asymptotic behaviors of the RFRR with nearly orthogonal deterministic unit-length input data vectors in the overparameterized regime, where the width of the first layer is much larger than the sample size. Our analysis shows high-probability non-asymptotic concentration results for the training errors, cross-validations, and generalization errors of RFRR centered around their respective values for a kernel ridge regression (KRR). This KRR is derived from an expected kernel generated by a nonlinear random feature map. We then approximate the performance of the KRR by a polynomial kernel matrix obtained from the Hermite polynomial expansion of the activation function, whose degree only depends on the orthogonality among different data points. This polynomial kernel determines the asymptotic behavior of the RFRR and the KRR. Our results hold for a wide variety of activation functions and input data sets that exhibit nearly orthogonal properties. Based on these approximations, we obtain a lower bound for the generalization error of the RFRR for a nonlinear student-teacher model.
翻译:本文研究了由两层随机高斯初始化神经网络给出的随机特征岭回归(RFRR)的性质。我们考察了在过参数化机制下(即第一层宽度远大于样本量),对于几乎正交的确定单位长度输入数据向量,RFRR的非渐近行为。我们的分析表明,训练误差、交叉验证误差以及泛化误差的RFRR均以高概率集中在核岭回归(KRR)的相应值周围,并呈现非渐近集中结果。该KRR由非线性随机特征映射产生的期望核导出。随后,我们通过激活函数的埃尔米特多项式展开(其多项式度数仅取决于不同数据点之间的正交性)得到的多项式核矩阵来近似KRR的性能,该多项式核决定了RFRR和KRR的渐近行为。我们的结果适用于多种激活函数和表现出近正交特性的输入数据集。基于这些近似,我们得到了非线性师生模型下RFRR泛化误差的下界。