We present a new family of information-theoretic generalization bounds, in which the training loss and the population loss are compared through a jointly convex function. This function is upper-bounded in terms of the disintegrated, samplewise, evaluated conditional mutual information (CMI), an information measure that depends on the losses incurred by the selected hypothesis, rather than on the hypothesis itself, as is common in probably approximately correct (PAC)-Bayesian results. We demonstrate the generality of this framework by recovering and extending previously known information-theoretic bounds. Furthermore, using the evaluated CMI, we derive a samplewise, average version of Seeger's PAC-Bayesian bound, where the convex function is the binary KL divergence. In some scenarios, this novel bound results in a tighter characterization of the population loss of deep neural networks than previous bounds. Finally, we derive high-probability versions of some of these average bounds. We demonstrate the unifying nature of the evaluated CMI bounds by using them to recover average and high-probability generalization bounds for multiclass classification with finite Natarajan dimension.
翻译:我们提出了一族新的信息论泛化界,其中训练损失与总体损失通过联合凸函数进行比较。该函数的上界由离散化、逐样本、评估条件互信息(CMI)给出——这种信息度量取决于所选假设产生的损失,而非假设本身(与PAC-Bayesian结果中常见的方式不同)。我们通过恢复并扩展已知的信息论界展示了该框架的普适性。此外,利用评估CMI,我们推导了Seeger PAC-Bayesian界的逐样本平均版本,其中凸函数为二元KL散度。在某些场景下,此新型界比先前结果能更紧致地表征深度神经网络的总体损失。最后,我们推导了部分平均界的高概率版本。通过利用评估CMI界恢复具有有限Natarajan维数的多类分类的平均与高概率泛化界,我们证明了该方法的统一性。