The Fundamental Theorem of PAC Learning asserts that learnability of a concept class $H$ is equivalent to the $\textit{uniform convergence}$ of empirical error in $H$ to its mean, or equivalently, to the problem of $\textit{density estimation}$, learnability of the underlying marginal distribution with respect to events in $H$. This seminal equivalence relies strongly on PAC learning's `distribution-free' assumption, that the adversary may choose any marginal distribution over data. Unfortunately, the distribution-free model is known to be overly adversarial in practice, failing to predict the success of modern machine learning algorithms, but without the Fundamental Theorem our theoretical understanding of learning under distributional constraints remains highly limited. In this work, we revisit the connection between PAC learning, uniform convergence, and density estimation beyond the distribution-free setting when the adversary is restricted to choosing a marginal distribution from a known family $\mathscr{P}$. We prove that while the traditional Fundamental Theorem indeed fails, a finer-grained connection between the three fundamental notions continues to hold: 1. PAC-Learning is strictly sandwiched between two refined models of density estimation, differing only in whether the learner $\textit{knows}$ the set of well-estimated events in $H$. 2. Under reasonable assumptions on $H$ and $\mathscr{P}$, density estimation is equivalent to $\textit{uniform estimation}$, a relaxation of uniform convergence allowing non-empirical estimators. Together, our results give a clearer picture of how the Fundamental Theorem extends beyond the distribution-free setting and shed new light on the classically challenging problem of learning under arbitrary distributional assumptions.
翻译:PAC学习的基本定理断言,概念类$H$的可学习性等价于$H$中经验误差到其均值的$\textit{一致收敛}$,或等价于$\textit{密度估计}$问题,即关于$H$中事件的基础边缘分布的可学习性。这一开创性的等价性强烈依赖于PAC学习的“分布无关”假设,即对手可以选择数据上的任意边缘分布。遗憾的是,分布无关模型在实践中被证明是过度对抗性的,未能预测现代机器学习算法的成功,但若无基本定理,我们对分布约束下学习的理论理解仍然极为有限。在本工作中,我们重新审视了当对手被限制从已知族$\mathscr{P}$中选择边缘分布时,超越分布无关设置的PAC学习、一致收敛和密度估计之间的联系。我们证明,虽然传统的基本定理确实失效,但三个基本概念之间更细粒度的联系仍然成立:1. PAC学习被严格夹在两个精炼的密度估计模型之间,仅在学习者是否$\textit{知道}$$H$中被良好估计的事件集上有所不同。2. 在关于$H$和$\mathscr{P}$的合理假设下,密度估计等价于$\textit{一致估计}$,这是允许非经验估计器的一致收敛的松弛形式。总之,我们的结果更清晰地描绘了基本定理如何扩展到分布无关设置之外,并对经典上具有挑战性的任意分布假设下的学习问题提供了新的见解。