Scaling laws for neural networks, in which the loss decays as a power-law in the number of parameters, data, and compute, depend fundamentally on the spectral structure of the data covariance, with power-law eigenvalue decay appearing ubiquitously in vision and language tasks. A central question is whether this spectral structure is preserved or destroyed when data passes through the basic building block of a neural network: a random linear projection followed by a nonlinear activation. We study this question for the random feature model: given data $x \sim N(0,H)\in \mathbb{R}^v$ where $H$ has $α$-power-law spectrum ($λ_j(H ) \asymp j^{-α}$, $α> 1$), a Gaussian sketch matrix $W \in \mathbb{R}^{v\times d}$, and an entrywise monomial $f(y) = y^{p}$, we characterize the eigenvalues of the population random-feature covariance $\mathbb{E}_{x }[\frac{1}{d}f(W^\top x )^{\otimes 2}]$. We prove matching upper and lower bounds: for all $1 \leq j \leq c_1 d \log^{-(p+1)}(d)$, the $j$-th eigenvalue is of order $\left(\log^{p-1}(j+1)/j\right)^α$. For $ c_1 d \log^{-(p+1)}(d)\leq j\leq d$, the $j$-th eigenvalue is of order $j^{-α}$ up to a polylog factor. That is, the power-law exponent $α$ is inherited exactly from the input covariance, modified only by a logarithmic correction that depends on the monomial degree $p$. The proof combines a dyadic head-tail decomposition with Wick chaos expansions for higher-order monomials and random matrix concentration inequalities.
翻译:神经网络的标度律——即损失随参数数量、数据量和计算量呈幂律衰减——从根本上依赖于数据协方差矩阵的谱结构,其中幂律特征值衰减在视觉和语言任务中普遍存在。一个核心问题是:当数据通过神经网络的基本构建模块(随机线性投影后接非线性激活函数)时,这种谱结构是被保留还是被破坏?我们针对随机特征模型研究该问题:给定数据 $x \sim N(0,H)\in \mathbb{R}^v$(其中 $H$ 具有 $α$ 幂律谱,即 $λ_j(H ) \asymp j^{-α}$,$α> 1$)、高斯草图矩阵 $W \in \mathbb{R}^{v\times d}$ 以及逐项单项式 $f(y) = y^{p}$,我们刻画了总体随机特征协方差矩阵 $\mathbb{E}_{x }[\frac{1}{d}f(W^\top x )^{\otimes 2}]$ 的特征值。我们证明了匹配的上下界:对于所有 $1 \leq j \leq c_1 d \log^{-(p+1)}(d)$,第 $j$ 个特征值的量级为 $\left(\log^{p-1}(j+1)/j\right)^α$;而对于 $ c_1 d \log^{-(p+1)}(d)\leq j\leq d$,第 $j$ 个特征值的量级为 $j^{-α}$ 乘以一个多对数因子。这意味着幂律指数 $α$ 从输入协方差矩阵中被精确继承,仅受依赖于单项式次数 $p$ 的对数修正。证明过程结合了二分头尾分解法、高阶单项式的维克混沌展开以及随机矩阵浓度不等式。