Double-descent refers to the unexpected drop in test loss of a learning algorithm beyond an interpolating threshold with over-parameterization, which is not predicted by information criteria in their classical forms due to the limitations in the standard asymptotic approach. We update these analyses using the information risk minimization framework and provide Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) for models learned by the Gibbs algorithm. Notably, the penalty terms for the Gibbs-based AIC and BIC correspond to specific information measures, i.e., symmetrized KL information and KL divergence. We extend this information-theoretic analysis to over-parameterized models by providing two different Gibbs-based BICs to compute the marginal likelihood of random feature models in the regime where the number of parameters $p$ and the number of samples $n$ tend to infinity, with $p/n$ fixed. Our experiments demonstrate that the Gibbs-based BIC can select the high-dimensional model and reveal the mismatch between marginal likelihood and population risk in the over-parameterized regime, providing new insights to understand double-descent.
翻译:双下降现象是指学习算法在过参数化情况下,越过插值阈值后测试损失出现意外下降,这一现象因其标准渐近方法的局限性而无法通过经典形式的信息准则预测。我们利用信息风险最小化框架更新了这些分析,并针对吉布斯算法学习到的模型提供了赤池信息准则(AIC)和贝叶斯信息准则(BIC)。值得注意的是,基于吉布斯的AIC和BIC的惩罚项分别对应于特定的信息度量,即对称化KL信息和KL散度。我们将这种信息论分析扩展到过参数化模型,为随机特征模型在参数数量$p$和样本数量$n$趋于无穷大且$p/n$固定时,提供了两种不同的基于吉布斯的BIC计算边缘似然的方法。实验表明,基于吉布斯的BIC能够选择高维模型,并揭示过参数化区域中边缘似然与总体风险之间的不匹配,从而为理解双下降现象提供了新见解。