We conduct a detailed investigation of tempered posteriors and uncover a number of crucial and previously undiscussed points. Contrary to previous results, we first show that for realistic models and datasets and the tightly controlled case of the Laplace approximation to the posterior, stochasticity does not in general improve test accuracy. The coldest temperature is often optimal. One might think that Bayesian models with some stochasticity can at least obtain improvements in terms of calibration. However, we show empirically that when gains are obtained this comes at the cost of degradation in test accuracy. We then discuss how targeting Frequentist metrics using Bayesian models provides a simple explanation of the need for a temperature parameter $\lambda$ in the optimization objective. Contrary to prior works, we finally show through a PAC-Bayesian analysis that the temperature $\lambda$ cannot be seen as simply fixing a misspecified prior or likelihood.
翻译:我们对温和后验进行了详细研究,揭示了一系列先前未讨论的关键问题。首先,与以往结果相反,我们证明对于现实模型和数据集,在后验的拉普拉斯近似严格受控的情况下,随机性通常不会提升测试准确率。最冷的温度往往是最优的。人们可能认为具有一定随机性的贝叶斯模型至少可以在校准方面获得改进。然而,我们通过实证表明,当获得改进时,这是以测试准确率下降为代价的。随后,我们讨论如何利用贝叶斯模型针对频率学派指标进行优化,为优化目标中需要温度参数 $\lambda$ 提供了简单解释。与先前研究不同,我们最终通过PAC-贝叶斯分析证明,温度 $\lambda$ 不能被视为仅仅是修正了错误设定的先验或似然。