The generalization error of a learning algorithm refers to the discrepancy between the loss of a learning algorithm on training data and that on unseen testing data. Various information-theoretic bounds on the generalization error have been derived in the literature, where the mutual information between the training data and the hypothesis (the output of the learning algorithm) plays an important role. Focusing on the individual sample mutual information bound by Bu et al., which itself is a tightened version of the first bound on the topic by Russo et al. and Xu et al., this paper investigates the tightness of these bounds, in terms of the dependence of their convergence rates on the sample size $n$. It has been recognized that these bounds are in general not tight, readily verified for the exemplary quadratic Gaussian mean estimation problem, where the individual sample mutual information bound scales as $O(\sqrt{1/n})$ while the true generalization error scales as $O(1/n)$. The first contribution of this paper is to show that the same bound can in fact be asymptotically tight if an appropriate assumption is made. In particular, we show that the fast rate can be recovered when the assumption is made on the excess risk instead of the loss function, which was usually done in existing literature. A theoretical justification is given for this choice. The second contribution of the paper is a new set of generalization error bounds based on the $(\eta, c)$-central condition, a condition relatively easy to verify and has the property that the mutual information term directly determines the convergence rate of the bound. Several analytical and numerical examples are given to show the effectiveness of these bounds.
翻译:学习算法的泛化误差指其在训练数据与未见测试数据上损失之间的差异。已有文献推导了多种基于信息论的泛化误差界,其中训练数据与假设(学习算法输出)之间的互信息起着关键作用。本文聚焦于Bu等人提出的个体样本互信息界(该界本身是对Russo等人及Xu等人首创界定的改进),从收敛速率对样本量$n$的依赖关系角度研究这些界的紧致性。现有共识认为这些界通常不紧致,这在二次高斯均值估计的示例问题中即可验证:个体样本互信息界以$O(\sqrt{1/n})$缩放,而真实泛化误差以$O(1/n)$缩放。本文的第一个贡献是证明:若引入适当假设,该界实际上可渐近紧致。特别地,我们证明当对超额风险(而非现有文献通常处理的损失函数)施加假设时,可恢复快速收敛速率,并为此选择提供了理论依据。本文的第二个贡献是基于$(\eta, c)$-中心条件提出新的泛化误差界,该条件相对易于验证,且具有互信息项直接决定界收敛速率的特性。文中通过若干解析与数值示例展示了这些界的有效性。