We study a foundational variant of Valiant and Vapnik and Chervonenkis' Probably Approximately Correct (PAC)-Learning in which the adversary is restricted to a known family of marginal distributions $\mathscr{P}$. In particular, we study how the PAC-learnability of a triple $(\mathscr{P},X,H)$ relates to the learners ability to infer \emph{distributional} information about the adversary's choice of $D \in \mathscr{P}$. To this end, we introduce the `unsupervised' notion of \emph{TV-Learning}, which, given a class $(\mathscr{P},X,H)$, asks the learner to approximate $D$ from unlabeled samples with respect to a natural class-conditional total variation metric. In the classical distribution-free setting, we show that TV-learning is \emph{equivalent} to PAC-Learning: in other words, any learner must infer near-maximal information about $D$. On the other hand, we show this characterization breaks down for general $\mathscr{P}$, where PAC-Learning is strictly sandwiched between two approximate variants we call `Strong' and `Weak' TV-learning, roughly corresponding to unsupervised learners that estimate most relevant distances in $D$ with respect to $H$, but differ in whether the learner \emph{knows} the set of well-estimated events. Finally, we observe that TV-learning is in fact equivalent to the classical notion of \emph{uniform estimation}, and thereby give a strong refutation of the uniform convergence paradigm in supervised learning.
翻译:我们研究了Valiant以及Vapnik和Chervonenkis提出的“可能近似正确”(PAC)学习的一个基础变体,其中对手被限制在一个已知的边缘分布族$\mathscr{P}$内。具体而言,我们研究了三元组$(\mathscr{P}, X, H)$的PAC可学习性与学习者推断对手选择的分布$D \in \mathscr{P}$的分布信息能力之间的关系。为此,我们引入了“无监督”概念——\emph{TV-学习},该概念给定类别$(\mathscr{P}, X, H)$,要求学习者在类条件总变差度量下利用无标签样本近似$D$。在经典的无分布设置中,我们证明TV-学习与PAC-学习是\emph{等价的}:换言之,任何学习者都必须推断出关于$D$的近最大信息。另一方面,我们表明这种刻画在一般的$\mathscr{P}$下不再成立,此时PAC-学习严格介于两种近似变体之间,我们分别称之为“强”TV-学习和“弱”TV-学习,大致对应于那些估计$D$相对于$H$的最相关距离的无监督学习者,但区别在于学习者是否\emph{知晓}被良好估计的事件集。最后,我们观察到TV-学习实际上等同于经典概念——\emph{一致估计},从而有力地反驳了监督学习中的一致收敛范式。