We provide a new information-theoretic generalization error bound that is exactly tight (i.e., matching even the constant) for the canonical quadratic Gaussian mean estimation problem. Despite considerable existing efforts in deriving information-theoretic generalization error bounds, applying them to this simple setting where sample average is used as the estimate of the mean value of Gaussian data has not yielded satisfying results. In fact, most existing bounds are order-wise loose in this setting, which has raised concerns about the fundamental capability of information-theoretic bounds in reasoning the generalization behavior for machine learning. The proposed new bound adopts the individual-sample-based approach proposed by Bu et al., but also has several key new ingredients. Firstly, instead of applying the change of measure inequality on the loss function, we apply it to the generalization error function itself; secondly, the bound is derived in a conditional manner; lastly, a reference distribution, which bears a certain similarity to the prior distribution in the Bayesian setting, is introduced. The combination of these components produces a general KL-divergence-based generalization error bound. We further show that although the conditional bounding and the reference distribution can make the bound exactly tight, removing them does not significantly degrade the bound, which leads to a mutual-information-based bound that is also asymptotically tight in this setting.
翻译:我们针对经典的二次高斯均值估计问题,提出一个完全紧致(即常数项也匹配)的新信息论泛化误差界。尽管已有大量研究致力于推导信息论泛化误差界,但将其应用于“样本均值作为高斯数据均值估计”这一简单场景时,结果始终不尽如人意。事实上,现有大多数界在该场景下均为阶次松散的,这引发了学界对信息论方法在解释机器学习泛化行为方面基础能力的质疑。本文提出的新界继承了Bu等人提出的基于独立样本的方法,但引入了多项关键创新。首先,我们未将测度变换不等式应用于损失函数,而是直接作用于泛化误差函数本身;其次,该界以条件化方式推导;最后,引入一个与贝叶斯设定中先验分布具有相似性的参考分布。这些要素的组合生成了一个基于KL散度的通用泛化误差界。我们进一步证明,虽然条件化界与参考分布能使结果达到完全紧致,但移除它们并不会显著降低界的质量——由此衍生出的互信息基界在该场景下仍具有渐近紧致性。