In this work, we provide a sharp theory of scaling laws for two-layer neural networks trained on a class of hierarchical multi-index targets, in a genuinely representation-limited regime. We derive exact information-theoretic scaling laws for subspace recovery and prediction error, revealing how the hierarchical features of the target are sequentially learned through a cascade of phase transitions. We further show that these optimal rates are achieved by a simple, target-agnostic spectral estimator, which can be interpreted as the small learning-rate limit of gradient descent on the first-layer weights. Once an adapted representation is identified, the readout can be learned statistically optimally, using an efficient procedure. As a consequence, we provide a unified and rigorous explanation of scaling laws, plateau phenomena, and spectral structure in shallow neural networks trained on such hierarchical targets.
翻译:本文针对一类分层多索引目标函数,在真实的表示受限机制下,为两层神经网络训练提供了一个精确的缩放定律理论。我们推导了子空间恢复与预测误差的精确信息论缩放定律,揭示了目标函数的层次特征如何通过一系列相变被顺序学习。我们进一步证明,这些最优速率可以通过一个简单、与目标无关的谱估计器实现,该估计器可解释为对第一层权重进行梯度下降时小学习率极限下的结果。一旦识别出适配的表示,便可通过一种高效程序以统计最优方式学习读出层。因此,我们为在此类分层目标上训练的浅层神经网络中的缩放定律、平台现象及谱结构提供了一个统一而严格的解释。