PAC-Bayesian analysis is a frequentist framework for incorporating prior knowledge into learning. It was inspired by Bayesian learning, which allows sequential data processing and naturally turns posteriors from one processing step into priors for the next. However, despite two and a half decades of research, the ability to update priors sequentially without losing confidence information along the way remained elusive for PAC-Bayes. While PAC-Bayes allows construction of data-informed priors, the final confidence intervals depend only on the number of points that were not used for the construction of the prior, whereas confidence information in the prior, which is related to the number of points used to construct the prior, is lost. This limits the possibility and benefit of sequential prior updates, because the final bounds depend only on the size of the final batch. We present a novel and, in retrospect, surprisingly simple and powerful PAC-Bayesian procedure that allows sequential prior updates with no information loss. The procedure is based on a novel decomposition of the expected loss of randomized classifiers. The decomposition rewrites the loss of the posterior as an excess loss relative to a downscaled loss of the prior plus the downscaled loss of the prior, which is bounded recursively. As a side result, we also present a generalization of the split-kl and PAC-Bayes-split-kl inequalities to discrete random variables, which we use for bounding the excess losses, and which can be of independent interest. In empirical evaluation the new procedure significantly outperforms state-of-the-art.
翻译:PAC-Bayesian分析是一种将先验知识融入学习的频率主义框架,其灵感来源于允许序列化数据处理、并能自然地将上一处理步骤的后验转化为下一步骤先验的贝叶斯学习。然而,尽管经过二十五年研究,PAC-Bayes方法始终未能实现不损失置信信息的序列化先验更新。虽然PAC-Bayes允许构建数据驱动的先验,但最终置信区间仅取决于未用于构建先验的数据点数量,而与构建先验所用数据量相关的先验置信信息却在过程中丢失。这限制了序列先验更新的可行性与优势,因为最终边界仅取决于最终批次的数据规模。我们提出了一种新颖且——回顾来看——异常简洁而强大的PAC-Bayesian流程,能够实现无信息损失的序列先验更新。该流程基于随机分类器期望损失的新型分解方法,将后验损失重写为相对于缩放先验损失的超额损失与缩放先验损失之和,并通过递归方式界定后者。作为衍生成果,我们还将分割KL不等式与PAC-Bayes-分割KL不等式推广至离散随机变量,用于界定超额损失,该推广本身具有独立学术价值。实证评估表明,新流程显著优于现有最优方法。