PAC-Bayes learning is an established framework to both assess the generalisation ability of learning algorithms, and design new learning algorithm by exploiting generalisation bounds as training objectives. Most of the exisiting bounds involve a \emph{Kullback-Leibler} (KL) divergence, which fails to capture the geometric properties of the loss function which are often useful in optimisation. We address this by extending the emerging \emph{Wasserstein PAC-Bayes} theory. We develop new PAC-Bayes bounds with Wasserstein distances replacing the usual KL, and demonstrate that sound optimisation guarantees translate to good generalisation abilities. In particular we provide generalisation bounds for the \emph{Bures-Wasserstein SGD} by exploiting its optimisation properties.
翻译:PAC-Bayes学习是一个成熟的框架,既能评估学习算法的泛化能力,也能通过将泛化界作为训练目标来设计新算法。现有的大多数界涉及Kullback-Leibler(KL)散度,但KL散度无法捕捉损失函数的几何特性,而这些特性在优化中往往十分有用。我们通过扩展新兴的Wasserstein PAC-Bayes理论来解决这一问题。我们发展了以Wasserstein距离替代传统KL散度的新PAC-Bayes界,并证明可靠的优化保证能够转化为良好的泛化能力。特别地,我们通过利用Bures-Wasserstein SGD的优化性质,为其提供了泛化界。