We derive generic information-theoretic and PAC-Bayesian generalization bounds involving an arbitrary convex comparator function, which measures the discrepancy between the training and population loss. The bounds hold under the assumption that the cumulant-generating function (CGF) of the comparator is upper-bounded by the corresponding CGF within a family of bounding distributions. We show that the tightest possible bound is obtained with the comparator being the convex conjugate of the CGF of the bounding distribution, also known as the Cram\'er function. This conclusion applies more broadly to generalization bounds with a similar structure. This confirms the near-optimality of known bounds for bounded and sub-Gaussian losses and leads to novel bounds under other bounding distributions.
翻译:我们推导了通用的信息论和PAC-贝叶斯泛化界,其中包含一个任意凸比较函数,该函数用于度量训练损失与总体损失之间的差异。这些界成立的前提是:比较函数的累积生成函数(CGF)被某个有界分布族中对应的CGF所上界约束。我们证明,当比较函数取为有界分布CGF的凸共轭(即Cramér函数)时,可得到最紧的可能界。这一结论更广泛地适用于具有类似结构的泛化界。该结果确认了有界损失和次高斯损失下已知界的近乎最优性,并推导出其他有界分布下的新界。