Double-descent refers to the unexpected drop in test loss of a learning algorithm beyond an interpolating threshold with over-parameterization, which is not predicted by information criteria in their classical forms due to the limitations in the standard asymptotic approach. We update these analyses using the information risk minimization framework and provide Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) for models learned by stochastic gradient Langevin dynamics (SGLD). Notably, the AIC and BIC penalty terms for SGLD correspond to specific information measures, i.e., symmetrized KL information and KL divergence. We extend this information-theoretic analysis to over-parameterized models by characterizing the SGLD-based BIC for the random feature model in the regime where the number of parameters $p$ and the number of samples $n$ tend to infinity, with $p/n$ fixed. Our experiments demonstrate that the refined SGLD-based BIC can track the double-descent curve, providing meaningful guidance for model selection and revealing new insights into the behavior of SGLD learning algorithms in the over-parameterized regime.
翻译:双下降现象是指过参数化学习算法在越过插值阈值后测试损失的意外下降,经典信息准则因标准渐近方法的局限性而无法预测这一现象。我们利用信息风险最小化框架更新了这些分析,针对随机梯度Langevin动力学(SGLD)学习的模型提供了赤池信息准则(AIC)和贝叶斯信息准则(BIC)。值得注意的是,SGLD的AIC和BIC惩罚项对应特定的信息度量,即对称化KL信息与KL散度。通过刻画参数数量$p$与样本数量$n$趋于无穷且$p/n$固定时随机特征模型的SGLD-BIC,我们将这一信息论分析扩展到过参数化模型。实验表明,改进的SGLD-BIC能够追踪双下降曲线,为模型选择提供有意义的指导,并揭示过参数化机制下SGLD学习算法行为的新见解。