Gradient-Flow Optimization as Dynamic Random-Effects Inference: Testing and Early Stopping with Applications to Deep Learning

Gradient-flow optimization is usually viewed as an algorithmic procedure for minimizing empirical loss, with training duration selected by validation or heuristic early-stopping rules. We develop a statistical inference framework for the gradient-flow training trajectory itself. The central object is fixed-operator squared-error gradient flow: whenever the fitted value evolves through a time-invariant positive semidefinite training operator, the trained model output at each training time is exactly equivalent to the best linear unbiased predictor, or empirical-Bayes posterior mean, under a corresponding random-effects model. Under this representation, training time becomes a variance-component parameter governing how variance is reallocated from residual noise to structured signal. This turns two basic training decisions into inferential problems. First, whether training is needed is formulated as a variance-component test for signal beyond initialization. Second, how long to train is formulated as restricted maximum likelihood (REML) estimation of the training-time variance component. The resulting REML-guided early stopping rule has a spectral interpretation: it selects the training time at which optimized spectral losses become empirically decorrelated from the eigenvalues of the training operator, yielding an effective degrees-of-freedom measure for the evolving trained model. We establish asymptotic prediction optimality for fixed-design in-sample risk and, under additional kernel regularity conditions, random-design out-of-sample risk. Deep learning models in fixed-kernel gradient regimes provide canonical modern-AI instantiations of the theory. Numerical experiments and a UK Biobank proteomics application show that the proposed inferential approach attains competitive prediction accuracy while reducing the reliance on validation splits and repeated checkpoint evaluation.

翻译：梯度流优化通常被视为最小化经验损失的算法过程，其训练时长通过验证集或启发式早停规则确定。我们为梯度流训练轨迹本身构建了一个统计推断框架。核心对象是固定算子平方误差梯度流：当拟合值通过时不变正半定训练算子演化时，每个训练时刻的模型输出恰好等价于对应随机效应模型下的最佳线性无偏预测或经验贝叶斯后验均值。在该表示下，训练时间成为调控方差从残差噪声向结构化信号重新分配的方差分量参数。这使得两个基本训练决策转化为推断问题：第一，是否需要训练被表述为检验初始化之外信号的方差分量检验；第二，训练时长被表述为训练时间方差分量的限制性最大似然估计。由此产生的REML引导早停规则具有谱解释：它选择优化谱损失与训练算子特征值经验解耦的训练时刻，为演化中的训练模型提供有效自由度度量。我们建立了固定设计样本内风险的渐近预测最优性，并在额外核正则条件下建立了随机设计样本外风险的渐近预测最优性。固定核梯度机制下的深度学习模型为该理论提供了现代AI的典型实例。数值实验和英国生物银行蛋白质组学应用表明，所提出的推断方法在降低对验证集分割和重复检查点评估依赖的同时，达到了具有竞争力的预测精度。