Despite their importance for assessing reliability of predictions, uncertainty quantification (UQ) measures for machine learning models have only recently begun to be rigorously characterized. One prominent issue is the curse of dimensionality: it is commonly believed that the marginal likelihood should be reminiscent of cross-validation metrics and that both should deteriorate with larger input dimensions. We prove that by tuning hyperparameters to maximize marginal likelihood (the empirical Bayes procedure), the performance, as measured by the marginal likelihood, improves monotonically} with the input dimension. On the other hand, we prove that cross-validation metrics exhibit qualitatively different behavior that is characteristic of double descent. Cold posteriors, which have recently attracted interest due to their improved performance in certain settings, appear to exacerbate these phenomena. We verify empirically that our results hold for real data, beyond our considered assumptions, and we explore consequences involving synthetic covariates.
翻译:尽管不确定性量化(UQ)指标对于评估机器学习模型预测的可靠性至关重要,但对其严格刻画仍是近期才开始的。一个显著问题是维度灾难:人们普遍认为边际似然应与交叉验证指标类似,且两者都会随着输入维度的增加而恶化。我们证明,通过调整超参数以最大化边际似然(经验贝叶斯过程),模型性能(按边际似然度量)会随输入维度单调提升。另一方面,我们证明交叉验证指标呈现出具有双重下降特征的定性差异行为。近期因特定场景下性能提升而备受关注的冷后验,似乎会加剧这些现象。我们通过真实数据(超出假设范围)实证验证了结论,并探讨了涉及合成协变量的影响。