快速速率信息论泛化误差界 (Fast Rate Information-theoretic Bounds on Generalization Errors)

The generalization error of a learning algorithm refers to the discrepancy between the loss of a learning algorithm on training data and that on unseen testing data. Various information-theoretic bounds on the generalization error have been derived in the literature, where the mutual information between the training data and the hypothesis (the output of the learning algorithm) plays an important role. Focusing on the individual sample mutual information bound by Bu et al., which itself is a tightened version of the first bound on the topic by Russo et al. and Xu et al., this paper investigates the tightness of these bounds, in terms of the dependence of their convergence rates on the sample size $n$. It has been recognized that these bounds are in general not tight, readily verified for the exemplary quadratic Gaussian mean estimation problem, where the individual sample mutual information bound scales as $O(\sqrt{1/n})$ while the true generalization error scales as $O(1/n)$. The first contribution of this paper is to show that the same bound can in fact be asymptotically tight if an appropriate assumption is made. In particular, we show that the fast rate can be recovered when the assumption is made on the excess risk instead of the loss function, which was usually done in existing literature. A theoretical justification is given for this choice. The second contribution of the paper is a new set of generalization error bounds based on the $(\eta, c)$-central condition, a condition relatively easy to verify and has the property that the mutual information term directly determines the convergence rate of the bound. Several analytical and numerical examples are given to show the effectiveness of these bounds.

翻译：学习算法的泛化误差指其在训练数据与未见测试数据上损失之间的差异。已有文献推导了多种基于信息论的泛化误差界，其中训练数据与假设（学习算法输出）之间的互信息起着关键作用。本文聚焦于Bu等人提出的个体样本互信息界（该界本身是对Russo等人及Xu等人首创界定的改进），从收敛速率对样本量$n$的依赖关系角度研究这些界的紧致性。现有共识认为这些界通常不紧致，这在二次高斯均值估计的示例问题中即可验证：个体样本互信息界以$O(\sqrt{1/n})$缩放，而真实泛化误差以$O(1/n)$缩放。本文的第一个贡献是证明：若引入适当假设，该界实际上可渐近紧致。特别地，我们证明当对超额风险（而非现有文献通常处理的损失函数）施加假设时，可恢复快速收敛速率，并为此选择提供了理论依据。本文的第二个贡献是基于$(\eta, c)$-中心条件提出新的泛化误差界，该条件相对易于验证，且具有互信息项直接决定界收敛速率的特性。文中通过若干解析与数值示例展示了这些界的有效性。

相关内容

泛化误差

关注 107

学习方法的泛化能力（Generalization Error）是由该方法学习到的模型对未知数据的预测能力，是学习方法本质上重要的性质。现实中采用最多的办法是通过测试泛化误差来评价学习方法的泛化能力。泛化误差界刻画了学习算法的经验风险与期望风险之间偏差和收敛速度。一个机器学习的泛化误差（Generalization Error），是一个描述学生机器在从样品数据中学习之后，离教师机器之间的差距的函数。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【AI应用】Facebook-利用神经网络求解高等数学方程, Using neural networks to solve advanced mathematics equations

专知会员服务

34+阅读 · 2020年1月15日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日