Better-than-KL PAC-Bayes Bounds

Let $f(\theta, X_1),$ $ \dots,$ $ f(\theta, X_n)$ be a sequence of random elements, where $f$ is a fixed scalar function, $X_1, \dots, X_n$ are independent random variables (data), and $\theta$ is a random parameter distributed according to some data-dependent posterior distribution $P_n$. In this paper, we consider the problem of proving concentration inequalities to estimate the mean of the sequence. An example of such a problem is the estimation of the generalization error of some predictor trained by a stochastic algorithm, such as a neural network where $f$ is a loss function. Classically, this problem is approached through a PAC-Bayes analysis where, in addition to the posterior, we choose a prior distribution which captures our belief about the inductive bias of the learning problem. Then, the key quantity in PAC-Bayes concentration bounds is a divergence that captures the complexity of the learning problem where the de facto standard choice is the KL divergence. However, the tightness of this choice has rarely been questioned. In this paper, we challenge the tightness of the KL-divergence-based bounds by showing that it is possible to achieve a strictly tighter bound. In particular, we demonstrate new high-probability PAC-Bayes bounds with a novel and better-than-KL divergence that is inspired by Zhang et al. (2022). Our proof is inspired by recent advances in regret analysis of gambling algorithms, and its use to derive concentration inequalities. Our result is first-of-its-kind in that existing PAC-Bayes bounds with non-KL divergences are not known to be strictly better than KL. Thus, we believe our work marks the first step towards identifying optimal rates of PAC-Bayes bounds.

翻译：设$f(\theta, X_1),$ $ \dots,$ $ f(\theta, X_n)$为一列随机元素，其中$f$是固定的标量函数，$X_1, \dots, X_n$是独立随机变量（数据），而$\theta$为服从某数据依赖后验分布$P_n$的随机参数。本文考虑证明浓度不等式以估计该序列均值的问题。此类问题的典型实例是通过随机算法（如神经网络）训练的预测器的泛化误差估计，其中$f$表示损失函数。经典方法采用PAC-Bayes分析：除后验分布外，我们还需选择反映学习问题归纳偏置先验信念的先验分布。此时，PAC-Bayes浓度界的关键量是衡量学习问题复杂度的散度，而KL散度是事实上的标准选择。然而，该选择的紧致性鲜受质疑。本文通过证明严格更紧界限的存在性，对基于KL散度的界提出了挑战。具体而言，我们受Zhang等人(2022)启发，提出一种新颖且优于KL散度的散度，并基于此推导出新的高概率PAC-Bayes界。我们的证明思路源于赌博算法遗憾分析的最新进展及其在浓度不等式推导中的应用。该结果是首创性的——现有非KL散度的PAC-Bayes界尚未被证明严格优于KL散度。因此，我们认为本工作标志着向确定PAC-Bayes界最优速率迈出了第一步。