In this work we investigate the generalization performance of random feature ridge regression (RFRR). Our main contribution is a general deterministic equivalent for the test error of RFRR. Specifically, under a certain concentration property, we show that the test error is well approximated by a closed-form expression that only depends on the feature map eigenvalues. Notably, our approximation guarantee is non-asymptotic, multiplicative, and independent of the feature map dimension -- allowing for infinite-dimensional features. We expect this deterministic equivalent to hold broadly beyond our theoretical analysis, and we empirically validate its predictions on various real and synthetic datasets. As an application, we derive sharp excess error rates under standard power-law assumptions of the spectrum and target decay. In particular, we provide a tight result for the smallest number of features achieving optimal minimax error rate.
翻译:本研究探讨了随机特征岭回归(RFRR)的泛化性能。我们的主要贡献是提出了RFRR测试误差的通用确定性等价表达式。具体而言,在满足特定集中性条件下,我们证明测试误差可由仅依赖于特征映射特征值的闭式表达式精确逼近。值得注意的是,我们的逼近保证具有非渐近性、乘性且与特征映射维度无关的特点——允许处理无限维特征。我们预期该确定性等价在理论分析框架之外仍具有广泛适用性,并通过多组真实与合成数据集实证验证了其预测效能。作为应用,我们在谱衰减与目标衰减的标准幂律假设下推导出尖锐的超额误差率。特别地,我们针对达到最优极小极大误差率所需的最小特征数量给出了紧致性结果。