This paper presents a general methodology for deriving information-theoretic generalization bounds for learning algorithms. The main technical tool is a probabilistic decorrelation lemma based on a change of measure and a relaxation of Young's inequality in $L_{\psi_p}$ Orlicz spaces. Using the decorrelation lemma in combination with other techniques, such as symmetrization, couplings, and chaining in the space of probability measures, we obtain new upper bounds on the generalization error, both in expectation and in high probability, and recover as special cases many of the existing generalization bounds, including the ones based on mutual information, conditional mutual information, stochastic chaining, and PAC-Bayes inequalities. In addition, the Fernique-Talagrand upper bound on the expected supremum of a subgaussian process emerges as a special case.
翻译:本文提出了一种通用的方法论,用于推导学习算法的信息论泛化界。主要技术工具是基于测度变换和奥尔里奇空间 $L_{\psi_p}$ 中杨氏不等式放宽的概率解相关引理。通过将该解相关引理与对称化、耦合、概率测度空间中的链式构造等技术相结合,我们得到了泛化误差的新上界(包括期望形式和依概率形式),并恢复了许多现有泛化界作为特例,包括基于互信息、条件互信息、随机链式构造和PAC-贝叶斯不等式的泛化界。此外,费尔尼克-塔拉格兰德关于亚高斯过程期望上确界上界的结果也作为特例出现。