Understanding the minimal assumptions necessary for generalization is the fundamental question in learning theory. Unfortunately, most results rely heavily on independence (or some proxy thereof) of the data-generating process, while results for strongly dependent data are far more limited. Towards addressing this gap, we introduce the framework of simulatable processes, where the learner has access to a simulator that approximates the distribution generating the data (which may be an arbitrarily complex and dependent process). Surprisingly, given access to such a simulator, we show that we can recover the same learning guarantees as in the classical setting with independent data, namely, error bounds that depend on the VC dimension. Further, we use this framework to study the power of conditional sampling and show strict statistical and computational advantages in this setting. As a highlight of our framework, we exhibit a single algorithm that simultaneously learns any given VC class under all processes samplable in bounded polynomial time, with regret controlled by the time-bounded Kolmogorov complexity of the process. This provides a significant conceptual broadening of the classical PAC model.
翻译:理解泛化所需的最小假设是学习理论中的基本问题。遗憾的是,大多数结果严重依赖于数据生成过程的独立性(或其某种替代性质),而针对强依赖数据的结果则十分有限。为弥补这一空白,我们引入了可模拟过程框架,在该框架中,学习者可访问一个能近似数据生成分布(该分布可能为任意复杂且具有依赖性的过程)的模拟器。令人惊讶的是,我们发现,借助此类模拟器,我们能够恢复与经典独立数据场景相同的学习保证,即依赖于VC维的误差界。此外,我们利用该框架研究条件采样的能力,并揭示了在此设定下严格的统计与计算优势。作为我们框架的亮点,我们展示了一种单一算法,该算法能够在所有于有界多项式时间内可采样的过程下同时学习任意给定的VC类,其遗憾值受过程的时间有界柯尔莫哥洛夫复杂度控制。这为经典PAC模型提供了显著的概念性拓展。