Existing generalization theories of supervised learning typically take a holistic approach and provide bounds for the expected generalization over the whole data distribution, which implicitly assumes that the model generalizes similarly for all the classes. In practice, however, there are significant variations in generalization performance among different classes, which cannot be captured by the existing generalization bounds. In this work, we tackle this problem by theoretically studying the class-generalization error, which quantifies the generalization performance of each individual class. We derive a novel information-theoretic bound for class-generalization error using the KL divergence, and we further obtain several tighter bounds using the conditional mutual information (CMI), which are significantly easier to estimate in practice. We empirically validate our proposed bounds in different neural networks and show that they accurately capture the complex class-generalization error behavior. Moreover, we show that the theoretical tools developed in this paper can be applied in several applications beyond this context.
翻译:现有监督学习的泛化理论通常采用整体方法,提供关于整个数据分布期望泛化性能的界,这隐含假设模型对所有类别的泛化表现相似。然而在实际中,不同类别间的泛化性能存在显著差异,现有泛化界无法捕捉这一现象。本文致力于通过理论分析类别泛化误差(量化每个独立类别泛化性能的指标)来解决该问题。我们利用KL散度推导出类别泛化误差的新型信息论界,并进一步基于条件互信息(CMI)获得多个更紧的界——这些界在实际中显著更易估计。我们在不同神经网络中实证验证了所提界的有效性,结果表明其能准确捕捉复杂的类别泛化误差行为。此外,本文发展的理论工具可应用于该语境之外的多个场景。