The paper gives a bound on the generalization error of the Gibbs algorithm, which recovers known data-independent bounds for the high temperature range and extends to the low-temperature range, where generalization depends critically on the data-dependent loss-landscape. It is shown, that with high probability the generalization error of a single hypothesis drawn from the Gibbs posterior decreases with the total prior volume of all hypotheses with similar or smaller empirical error. This gives theoretical support to the belief in the benefit of flat minima. The zero temperature limit is discussed and the bound is extended to a class of similar stochastic algorithms.
翻译:本文给出了吉布斯算法泛化误差的一个界,该界恢复了高温范围内已知的数据无关界,并推广到低温范围,其中泛化能力关键依赖于数据相关的损失景观。研究表明,以高概率从吉布斯后验中抽取的单个假设的泛化误差,会随着所有具有相似或更小经验误差的假设的总先验体积的增大而减小。这为相信平坦最小值有益的观点提供了理论支持。文中讨论了零温极限,并将该界推广到一类类似的随机算法。