How do we compare between hypotheses that are entirely consistent with observations? The marginal likelihood (aka Bayesian evidence), which represents the probability of generating our observations from a prior, provides a distinctive approach to this foundational question, automatically encoding Occam's razor. Although it has been observed that the marginal likelihood can overfit and is sensitive to prior assumptions, its limitations for hyperparameter learning and discrete model comparison have not been thoroughly investigated. We first revisit the appealing properties of the marginal likelihood for learning constraints and hypothesis testing. We then highlight the conceptual and practical issues in using the marginal likelihood as a proxy for generalization. Namely, we show how marginal likelihood can be negatively correlated with generalization, with implications for neural architecture search, and can lead to both underfitting and overfitting in hyperparameter learning. We also re-examine the connection between the marginal likelihood and PAC-Bayes bounds and use this connection to further elucidate the shortcomings of the marginal likelihood for model selection. We provide a partial remedy through a conditional marginal likelihood, which we show is more aligned with generalization, and practically valuable for large-scale hyperparameter learning, such as in deep kernel learning.
翻译:如何比较完全符合观测数据的假设?边际似然(亦称贝叶斯证据)通过表征从先验生成观测值的概率,为这一基础问题提供了独特视角,并自动编码奥卡姆剃刀原则。尽管已有研究指出边际似然可能过拟合且对先验假设敏感,但其在超参数学习和离散模型比较中的局限性尚未得到深入探究。我们首先重新审视边际似然在约束学习和假设检验中的理想性质,随后重点揭示将边际似然作为泛化性代理指标时存在的概念与实践问题:即边际似然可能与泛化性呈负相关(对神经架构搜索产生影响),并在超参数学习中既可能导致欠拟合也可能导致过拟合。我们还重新检验了边际似然与PAC-Bayes界之间的关联,并借此进一步阐明边际似然在模型选择中的缺陷。最后,我们提出部分解决方案——条件边际似然,证明其与泛化性更一致,并在深度核学习等大规模超参数学习场景中具有实际应用价值。