We investigate the properties of random feature ridge regression (RFRR) given by a two-layer neural network with random Gaussian initialization. We study the non-asymptotic behaviors of the RFRR with nearly orthogonal deterministic unit-length input data vectors in the overparameterized regime, where the width of the first layer is much larger than the sample size. Our analysis shows high-probability non-asymptotic concentration results for the training errors, cross-validations, and generalization errors of RFRR centered around their respective values for a kernel ridge regression (KRR). This KRR is derived from an expected kernel generated by a nonlinear random feature map. We then approximate the performance of the KRR by a polynomial kernel matrix obtained from the Hermite polynomial expansion of the activation function, whose degree only depends on the orthogonality among different data points. This polynomial kernel determines the asymptotic behavior of the RFRR and the KRR. Our results hold for a wide variety of activation functions and input data sets that exhibit nearly orthogonal properties. Based on these approximations, we obtain a lower bound for the generalization error of the RFRR for a nonlinear student-teacher model.
翻译:我们研究了由具有随机高斯初始化的两层神经网络给出的随机特征岭回归(RFRR)的性质。在过参数化机制下(即第一层宽度远大于样本量),针对近似正交的确定性单位长度输入数据向量,我们分析了RFRR的非渐近行为。我们的分析表明,训练误差、交叉验证误差以及泛化误差以高概率围绕核岭回归(KRR)的对应值呈现非渐近集中结果。该KRR源自非线性随机特征映射生成的期望核。随后,我们通过激活函数的埃尔米特多项式展开得到多项式核矩阵来近似KRR的性能,该展开的阶数仅取决于不同数据点之间的正交性。此多项式核决定了RFRR和KRR的渐近行为。我们的结论适用于多种激活函数以及具有近似正交属性的输入数据集。基于这些近似,我们得到了非线性师生模型中RFRR泛化误差的下界。