Random feature models (RFMs), two-layer networks with a randomly initialized fixed first layer and a trained linear readout, are among the simplest nonlinear predictors. Prior asymptotic analyses in the proportional high-dimensional regime show that, under isotropic data, RFMs reduce to noisy linear models and offer no advantage over classical linear methods such as ridge regression. Yet RFMs frequently outperform linear baselines on structured real data. We show that this tension is explained by a correlation-driven phase transition: under spiked-covariance designs, the interaction between anisotropy and input-label correlation determines whether the RFM behaves as an effectively linear predictor or exhibits genuinely nonlinear gains. Concretely, we establish a universality principle under anisotropy and characterize the RFM generalization error via an equivalent noisy polynomial model. The effective degree of this polynomial, equivalently, which Hermite orders of the activation survive, is governed by the strength of input-label correlation, yielding an explicit boundary in the correlation-spike-magnitude plane. Below the boundary, the RFM collapses to a linear surrogate and can underperform strong linear baselines; above it, higher-order terms persist and the RFM achieves a clear nonlinear advantage. Numerical simulations and real-data experiments corroborate the theory and delineate the transition between these two regimes.
翻译:随机特征模型(RFMs)是一种最简单的非线性预测器,其结构为两层网络:第一层随机初始化并固定,第二层为经过训练的线性读出层。先前在比例高维区域下的渐近分析表明,在各向同性数据下,RFMs 退化为带噪声的线性模型,相比经典线性方法(如岭回归)并无优势。然而,RFMs 在结构化真实数据上常常优于线性基线。我们证明,这种矛盾可通过一种相关性驱动的相变来解释:在尖峰协方差设计下,各向异性与输入-标签相关性之间的相互作用决定了 RFM 是表现为有效的线性预测器,还是展现出真正的非线性增益。具体而言,我们在各向异性下建立了一个普适性原理,并通过一个等效的带噪声多项式模型来刻画 RFM 的泛化误差。该多项式的有效次数——等价于激活函数的哪些 Hermite 阶次得以保留——由输入-标签相关性的强度决定,从而在相关性-尖峰强度平面上给出了一个明确的边界。在此边界以下,RFM 坍缩为一个线性代理模型,其性能可能逊于强线性基线;在此边界以上,高阶项持续存在,RFM 则获得明显的非线性优势。数值模拟和真实数据实验验证了该理论,并清晰描绘了这两种区域之间的转变。