Exactly Tight Information-Theoretic Generalization Error Bound for the Quadratic Gaussian Problem

We provide a new information-theoretic generalization error bound that is exactly tight (i.e., matching even the constant) for the canonical quadratic Gaussian (location) problem. Most existing bounds are order-wise loose in this setting, which has raised concerns about the fundamental capability of information-theoretic bounds in reasoning the generalization behavior for machine learning. The proposed new bound adopts the individual-sample-based approach proposed by Bu et al., but also has several key new ingredients. Firstly, instead of applying the change of measure inequality on the loss function, we apply it to the generalization error function itself; secondly, the bound is derived in a conditional manner; lastly, a reference distribution is introduced. The combination of these components produces a KL-divergence-based generalization error bound. We show that although the latter two new ingredients can help make the bound exactly tight, removing them does not significantly degrade the bound, leading to an asymptotically tight mutual-information-based bound. We further consider the vector Gaussian setting, where a direct application of the proposed bound again does not lead to tight bounds except in special cases. A refined bound is then proposed for decomposable loss functions, leading to a tight bound for the vector setting.

翻译：本文针对经典二次高斯（位置）问题，提出了一个精确紧致（即常数项完全匹配）的新信息论泛化误差界。在该问题背景下，现有界大多仅达到阶数级宽松程度，这引发了学界对信息论界在解释机器学习泛化行为方面基本能力的质疑。本方法沿用Bu等人提出的基于独立样本的分析框架，但创新性地融入了三个关键要素：其一，将测度变换不等式直接应用于泛化误差函数本身而非损失函数；其二，采用条件化推导方式；其三，引入参考分布。这三者的结合产生了基于KL散度的泛化误差界。研究表明，尽管后两项新要素有助于实现界的精确紧致性，但即便舍弃它们也不会显著降低界值，可得到渐近紧致的基于互信息的界。进一步考虑向量高斯情形时发现，除特殊场景外直接应用所提方法仍无法获得紧致界。为此，我们针对可分解损失函数提出改进界，最终为向量情形建立了紧致界。

相关内容

泛化误差

关注 107

学习方法的泛化能力（Generalization Error）是由该方法学习到的模型对未知数据的预测能力，是学习方法本质上重要的性质。现实中采用最多的办法是通过测试泛化误差来评价学习方法的泛化能力。泛化误差界刻画了学习算法的经验风险与期望风险之间偏差和收敛速度。一个机器学习的泛化误差（Generalization Error），是一个描述学生机器在从样品数据中学习之后，离教师机器之间的差距的函数。

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日