This paper presents a general methodology for deriving information-theoretic generalization bounds for learning algorithms. The main technical tool is a probabilistic decorrelation lemma based on a change of measure and a relaxation of Young's inequality in $L_{\psi_p}$ Orlicz spaces. Using the decorrelation lemma in combination with other techniques, such as symmetrization, couplings, and chaining in the space of probability measures, we obtain new upper bounds on the generalization error, both in expectation and in high probability, and recover as special cases many of the existing generalization bounds, including the ones based on mutual information, conditional mutual information, stochastic chaining, and PAC-Bayes inequalities. In addition, the Fernique-Talagrand upper bound on the expected supremum of a subgaussian process emerges as a special case.
翻译:本文提出了一种通用的方法论,用于导出学习算法的信息论泛化界。主要技术工具是基于测度变换和$L_{\psi_p}$ Orlicz空间中Young不等式松弛的概率去相关引理。通过将该去相关引理与对称化、耦合以及概率测度空间中的链式方法等技术相结合,我们得到了泛化误差的新的上界(包括期望形式和依概率形式),并恢复了许多现有泛化界(如基于互信息、条件互信息、随机链式方法和PAC-Bayes不等式的泛化界)作为特例。此外,Fernique-Talagrand关于次高斯过程期望上确界的上界也作为特例出现。