The core of generalization theory was developed for independent observations. Some PAC and PAC-Bayes bounds are available for data that exhibit a temporal dependence. However, there are constants in these bounds that depend on properties of the data-generating process: mixing coefficients, mixing time, spectral gap... Such constants are unknown in practice. In this paper, we prove a new PAC-Bayes bound for Markov chains. This bound depends on a quantity called the pseudo-spectral gap. The main novelty is that we can provide an empirical bound on the pseudo-spectral gap when the state space is finite. Thus, we obtain the first fully empirical PAC-Bayes bound for Markov chains. This extends beyond the finite case, although this requires additional assumptions. On simulated experiments, the empirical version of the bound is essentially as tight as the non-empirical one.
翻译:泛化理论的核心是针对独立观测建立的。对于具有时间依赖性的数据,已有一些PAC和PAC-Bayes界。然而,这些界中的常数依赖于数据生成过程的性质:混合系数、混合时间、谱间隙……这些常数在实践中是未知的。本文中,我们证明了一个新的马尔可夫链PAC-Bayes界。该界依赖于一个称为伪谱间隙的量。主要创新在于,当状态空间有限时,我们能够提供伪谱间隙的经验界。因此,我们得到了首个完全经验的马尔可夫链PAC-Bayes界。这一结果可推广至无限状态空间情形,但需要附加假设。在模拟实验中,该界的经验版本与非经验版本在紧致性上基本相当。