A unified recipe for deriving (time-uniform) PAC-Bayes bounds

We present a unified framework for deriving PAC-Bayesian generalization bounds. Unlike most previous literature on this topic, our bounds are anytime-valid (i.e., time-uniform), meaning that they hold at all stopping times, not only for a fixed sample size. Our approach combines four tools in the following order: (a) nonnegative supermartingales or reverse submartingales, (b) the method of mixtures, (c) the Donsker-Varadhan formula (or other convex duality principles), and (d) Ville's inequality. Our main result is a PAC-Bayes theorem which holds for a wide class of discrete stochastic processes. We show how this result implies time-uniform versions of well-known classical PAC-Bayes bounds, such as those of Seeger, McAllester, Maurer, and Catoni, in addition to many recent bounds. We also present several novel bounds. Our framework also enables us to relax traditional assumptions; in particular, we consider nonstationary loss functions and non-i.i.d. data. In sum, we unify the derivation of past bounds and ease the search for future bounds: one may simply check if our supermartingale or submartingale conditions are met and, if so, be guaranteed a (time-uniform) PAC-Bayes bound.

翻译：我们提出一个统一的框架，用于推导PAC-贝叶斯泛化边界。与以往大多数相关文献不同，我们的边界是任意时刻有效的（即时间均匀的），这意味着它们对所有停止时间都成立，而不仅限于固定样本量。该方法按顺序结合四种工具：(a) 非负上鞅或反下鞅，(b) 混合方法，(c) Donsker-Varadhan公式（或其他凸对偶原理），以及(d) Ville不等式。主要结果是适用于广泛离散随机过程的PAC-贝叶斯定理。我们展示了该结果如何推导出经典PAC-贝叶斯边界（如Seeger、McAllester、Maurer和Catoni的边界）的时间均匀版本，以及众多最新边界，同时提出了若干全新边界。该框架还使我们能够放宽传统假设，特别是考虑了非平稳损失函数和非独立同分布数据。总之，我们统一了既往边界的推导过程，简化了未来边界的探索：只需验证上鞅或下鞅条件是否满足，即可保证获得（时间均匀的）PAC-贝叶斯边界。

相关内容

损失函数（机器学习）

关注 10

损失函数，在AI中亦称呼距离函数，度量函数。此处的距离代表的是抽象性的，代表真实数据与预测数据之间的误差。损失函数（loss function）是用来估量你模型的预测值f(x)与真实值Y的不一致程度，它是一个非负实值函数,通常使用L(Y, f(x))来表示，损失函数越小，模型的鲁棒性就越好。损失函数是经验风险函数的核心部分，也是结构风险函数重要组成部分。

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日