Empirical power--law scaling has been widely observed across modern deep learning systems, yet its theoretical origins and scope of validity remain incompletely understood. The Generalized Resolution--Shell Dynamics (GRSD) framework models learning as spectral energy transport across logarithmic resolution shells, providing a coarse--grained dynamical description of training. Within GRSD, power--law scaling corresponds to a particularly simple renormalized shell dynamics; however, such behavior is not automatic and requires additional structural properties of the learning process. In this work, we identify a set of sufficient conditions under which the GRSD shell dynamics admits a renormalizable coarse--grained description. These conditions constrain the learning configuration at multiple levels, including boundedness of gradient propagation in the computation graph, weak functional incoherence at initialization, controlled Jacobian evolution along training, and log--shift invariance of renormalized shell couplings. We further show that power--law scaling does not follow from renormalizability alone, but instead arises as a rigidity consequence: once log--shift invariance is combined with the intrinsic time--rescaling covariance of gradient flow, the renormalized GRSD velocity field is forced into a power--law form.
翻译:经验幂律标度在现代深度学习系统中被广泛观测到,但其理论起源与有效范围仍未被完全理解。广义分辨率壳层动力学框架将学习建模为跨对数分辨率壳层的谱能量输运,为训练提供了粗粒化的动力学描述。在该框架内,幂律标度对应一种特别简单的重整化壳层动力学;然而,这种行为并非自动产生,而是需要学习过程具备额外的结构特性。本工作中,我们识别了一组充分条件,使得广义分辨率壳层动力学的壳层动力学允许存在可重整化的粗粒度描述。这些条件在多个层面约束了学习配置,包括计算图中梯度传播的有界性、初始化时的弱函数非相干性、训练过程中雅可比矩阵的受控演化,以及重整化壳层耦合的对数平移不变性。我们进一步证明,幂律标度并非仅从可重整性推导得出,而是作为一种刚性结果出现:一旦将对数平移不变性与梯度流固有的时间重标度协变性相结合,重整化的广义分辨率壳层动力学速度场就被强制呈现为幂律形式。