Minimising upper bounds on the population risk or the generalisation gap has been widely used in structural risk minimisation (SRM) - this is in particular at the core of PAC-Bayesian learning. Despite its successes and unfailing surge of interest in recent years, a limitation of the PAC-Bayesian framework is that most bounds involve a Kullback-Leibler (KL) divergence term (or its variations), which might exhibit erratic behavior and fail to capture the underlying geometric structure of the learning problem - hence restricting its use in practical applications. As a remedy, recent studies have attempted to replace the KL divergence in the PAC-Bayesian bounds with the Wasserstein distance. Even though these bounds alleviated the aforementioned issues to a certain extent, they either hold in expectation, are for bounded losses, or are nontrivial to minimize in an SRM framework. In this work, we contribute to this line of research and prove novel Wasserstein distance-based PAC-Bayesian generalisation bounds for both batch learning with independent and identically distributed (i.i.d.) data, and online learning with potentially non-i.i.d. data. Contrary to previous art, our bounds are stronger in the sense that (i) they hold with high probability, (ii) they apply to unbounded (potentially heavy-tailed) losses, and (iii) they lead to optimizable training objectives that can be used in SRM. As a result we derive novel Wasserstein-based PAC-Bayesian learning algorithms and we illustrate their empirical advantage on a variety of experiments.
翻译:最小化总体风险或泛化差距的上界在结构风险最小化(SRM)中得到广泛应用——这特别是PAC-贝叶斯学习的核心。尽管近年来PAC-贝叶斯框架取得了成功并持续引发研究热潮,但其局限性在于大多数界包含Kullback-Leibler(KL)散度项(或其变体),这种散度可能表现出不稳定的行为,且无法捕捉学习问题的内在几何结构——因而限制了其在实际应用中的使用。为解决这一问题,近期研究尝试用Wasserstein距离替代PAC-贝叶斯界中的KL散度。尽管这些界在一定程度上缓解了上述问题,但它们要么仅在期望意义下成立,要么仅适用于有界损失函数,要么在SRM框架中难以优化。在本工作中,我们延续这一研究方向,针对独立同分布(i.i.d.)数据的批量学习和潜在非i.i.d.数据的在线学习,证明了基于Wasserstein距离的新型PAC-贝叶斯泛化界。与先前工作不同,我们的界在以下方面具有更强性质:(i) 以高概率成立,(ii) 适用于无界(可能重尾)损失函数,(iii) 可导出适于SRM的可优化训练目标。由此我们推导出基于Wasserstein的新型PAC-贝叶斯学习算法,并通过多种实验验证其经验优势。