We study the Out-of-Distribution (OOD) generalization in machine learning and propose a general framework that establishes information-theoretic generalization bounds. Our framework interpolates freely between Integral Probability Metric (IPM) and $f$-divergence, which naturally recovers some known results (including Wasserstein- and KL-bounds), as well as yields new generalization bounds. Additionally, we show that our framework admits an optimal transport interpretation. When evaluated in two concrete examples, the proposed bounds either strictly improve upon existing bounds in some cases or match the best existing OOD generalization bounds. Moreover, by focusing on $f$-divergence and combining it with the Conditional Mutual Information (CMI) methods, we derive a family of CMI-based generalization bounds, which include the state-of-the-art ICIMI bound as a special instance. Finally, leveraging these findings, we analyze the generalization of the Stochastic Gradient Langevin Dynamics (SGLD) algorithm, showing that our derived generalization bounds outperform existing information-theoretic generalization bounds in certain scenarios.
翻译:本文研究机器学习中的分布外(OOD)泛化问题,提出一个建立信息论泛化界的通用框架。该框架可在积分概率度量(IPM)与$f$-散度之间自由插值,自然地涵盖若干已知结果(包括Wasserstein界与KL界),同时推导出新的泛化界。此外,我们证明该框架具有最优传输的理论解释。在两个具体案例中的评估表明,所提出的泛化界在某些情况下严格优于现有界,或与当前最优的OOD泛化界相当。进一步地,通过聚焦于$f$-散度并结合条件互信息(CMI)方法,我们推导出一系列基于CMI的泛化界,其中包含当前最先进的ICIMI界作为特例。最后,基于这些发现,我们分析了随机梯度朗之万动力学(SGLD)算法的泛化性能,证明在特定场景下我们推导的泛化界优于现有的信息论泛化界。