Exactly Tight Information-Theoretic Generalization Error Bound for the Quadratic Gaussian Problem

We provide a new information-theoretic generalization error bound that is exactly tight (i.e., matching even the constant) for the canonical quadratic Gaussian mean estimation problem. Despite considerable existing efforts in deriving information-theoretic generalization error bounds, applying them to this simple setting where sample average is used as the estimate of the mean value of Gaussian data has not yielded satisfying results. In fact, most existing bounds are order-wise loose in this setting, which has raised concerns about the fundamental capability of information-theoretic bounds in reasoning the generalization behavior for machine learning. The proposed new bound adopts the individual-sample-based approach proposed by Bu et al., but also has several key new ingredients. Firstly, instead of applying the change of measure inequality on the loss function, we apply it to the generalization error function itself; secondly, the bound is derived in a conditional manner; lastly, a reference distribution, which bears a certain similarity to the prior distribution in the Bayesian setting, is introduced. The combination of these components produces a general KL-divergence-based generalization error bound. We further show that although the conditional bounding and the reference distribution can make the bound exactly tight, removing them does not significantly degrade the bound, which leads to a mutual-information-based bound that is also asymptotically tight in this setting.

翻译：我们针对经典的二次高斯均值估计问题，提出一个完全紧致（即常数项也匹配）的新信息论泛化误差界。尽管已有大量研究致力于推导信息论泛化误差界，但将其应用于“样本均值作为高斯数据均值估计”这一简单场景时，结果始终不尽如人意。事实上，现有大多数界在该场景下均为阶次松散的，这引发了学界对信息论方法在解释机器学习泛化行为方面基础能力的质疑。本文提出的新界继承了Bu等人提出的基于独立样本的方法，但引入了多项关键创新。首先，我们未将测度变换不等式应用于损失函数，而是直接作用于泛化误差函数本身；其次，该界以条件化方式推导；最后，引入一个与贝叶斯设定中先验分布具有相似性的参考分布。这些要素的组合生成了一个基于KL散度的通用泛化误差界。我们进一步证明，虽然条件化界与参考分布能使结果达到完全紧致，但移除它们并不会显著降低界的质量——由此衍生出的互信息基界在该场景下仍具有渐近紧致性。

相关内容

泛化误差

关注 107

学习方法的泛化能力（Generalization Error）是由该方法学习到的模型对未知数据的预测能力，是学习方法本质上重要的性质。现实中采用最多的办法是通过测试泛化误差来评价学习方法的泛化能力。泛化误差界刻画了学习算法的经验风险与期望风险之间偏差和收敛速度。一个机器学习的泛化误差（Generalization Error），是一个描述学生机器在从样品数据中学习之后，离教师机器之间的差距的函数。

手册《兵棋推演：工具、技术和程序》33页slides，Connections UK – Wargaming for Professionals

专知会员服务

42+阅读 · 2022年10月10日

加速图神经网络推理，121页ppt，普林斯顿大学JAVIER DUARTE主讲

专知会员服务

33+阅读 · 2022年6月13日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日