PAC学习者是否学习边际分布？ (Do PAC-Learners Learn the Marginal Distribution?)

The Fundamental Theorem of PAC Learning asserts that learnability of a concept class $H$ is equivalent to the $\textit{uniform convergence}$ of empirical error in $H$ to its mean, or equivalently, to the problem of $\textit{density estimation}$, learnability of the underlying marginal distribution with respect to events in $H$. This seminal equivalence relies strongly on PAC learning's `distribution-free' assumption, that the adversary may choose any marginal distribution over data. Unfortunately, the distribution-free model is known to be overly adversarial in practice, failing to predict the success of modern machine learning algorithms, but without the Fundamental Theorem our theoretical understanding of learning under distributional constraints remains highly limited. In this work, we revisit the connection between PAC learning, uniform convergence, and density estimation beyond the distribution-free setting when the adversary is restricted to choosing a marginal distribution from a known family $\mathscr{P}$. We prove that while the traditional Fundamental Theorem indeed fails, a finer-grained connection between the three fundamental notions continues to hold: 1. PAC-Learning is strictly sandwiched between two refined models of density estimation, both equivalent to standard density estimation in the distribution-free case, differing only in whether the learner $\textit{knows}$ the set of well-estimated events in $H$. 2. Under reasonable assumptions on $H$ and $\mathscr{P}$, density estimation is equivalent to \emph{uniform estimation}, a relaxation of uniform convergence allowing non-empirical estimators. Together, our results give a clearer picture of how the Fundamental Theorem extends beyond the distribution-free setting and shed new light on the classically challenging problem of learning under arbitrary distributional assumptions.

翻译：PAC学习的基本定理断言，概念类$H$的可学习性等价于$H$中经验误差到其均值的$\textit{一致收敛}$，或者等价于$\textit{密度估计}$问题，即关于$H$中事件的底层边际分布的可学习性。这一开创性的等价性强烈依赖于PAC学习的“分布无关”假设，即对手可以选择数据上的任意边际分布。遗憾的是，分布无关模型在实践中被证明过于对抗性，未能预测现代机器学习算法的成功，但若没有基本定理，我们对分布约束下学习的理论理解仍然非常有限。在本工作中，我们重新审视了当对手被限制从已知族$\mathscr{P}$中选择边际分布时，超越分布无关设置的PAC学习、一致收敛和密度估计之间的联系。我们证明，虽然传统的基本定理确实失效，但三个基本概念之间更细粒度的联系仍然成立：1. PAC学习严格夹在两个精炼的密度估计模型之间，这两个模型在分布无关情况下均等价于标准密度估计，区别仅在于学习者是否$\textit{知道}$$H$中被良好估计的事件集合。2. 在关于$H$和$\mathscr{P}$的合理假设下，密度估计等价于\emph{一致估计}，这是一种允许非经验估计器的一致收敛松弛形式。总之，我们的结果更清晰地展示了基本定理如何扩展到分布无关设置之外，并对经典难题——在任意分布假设下学习——提供了新的见解。