We provide a new information-theoretic generalization error bound that is exactly tight (i.e., matching even the constant) for the canonical quadratic Gaussian (location) problem. Most existing bounds are order-wise loose in this setting, which has raised concerns about the fundamental capability of information-theoretic bounds in reasoning the generalization behavior for machine learning. The proposed new bound adopts the individual-sample-based approach proposed by Bu et al., but also has several key new ingredients. Firstly, instead of applying the change of measure inequality on the loss function, we apply it to the generalization error function itself; secondly, the bound is derived in a conditional manner; lastly, a reference distribution is introduced. The combination of these components produces a KL-divergence-based generalization error bound. We show that although the latter two new ingredients can help make the bound exactly tight, removing them does not significantly degrade the bound, leading to an asymptotically tight mutual-information-based bound. We further consider the vector Gaussian setting, where a direct application of the proposed bound again does not lead to tight bounds except in special cases. A refined bound is then proposed for decomposable loss functions, leading to a tight bound for the vector setting.
翻译:本文针对经典二次高斯(位置)问题,提出了一个精确紧致(即常数项完全匹配)的新信息论泛化误差界。在该问题背景下,现有界大多仅达到阶数级宽松程度,这引发了学界对信息论界在解释机器学习泛化行为方面基本能力的质疑。本方法沿用Bu等人提出的基于独立样本的分析框架,但创新性地融入了三个关键要素:其一,将测度变换不等式直接应用于泛化误差函数本身而非损失函数;其二,采用条件化推导方式;其三,引入参考分布。这三者的结合产生了基于KL散度的泛化误差界。研究表明,尽管后两项新要素有助于实现界的精确紧致性,但即便舍弃它们也不会显著降低界值,可得到渐近紧致的基于互信息的界。进一步考虑向量高斯情形时发现,除特殊场景外直接应用所提方法仍无法获得紧致界。为此,我们针对可分解损失函数提出改进界,最终为向量情形建立了紧致界。