We revisit the framework of Smart PAC learning, which seeks supervised learners which compete with semi-supervised learners that are provided full knowledge of the marginal distribution on unlabeled data. Prior work has shown that such marginal-by-marginal guarantees are possible for "most" marginals, with respect to an arbitrary fixed and known measure, but not more generally. We discover that this failure can be attributed to an "indistinguishability" phenomenon: There are marginals which cannot be statistically distinguished from other marginals that require different learning approaches. In such settings, semi-supervised learning cannot certify its guarantees from unlabeled data, rendering them arguably non-actionable. We propose relatively smart learning, a new framework which demands that a supervised learner compete only with the best "certifiable" semi-supervised guarantee. We show that such modest relaxation suffices to bypass the impossibility results from prior work. In the distribution-free setting, we show that the OIG learner is relatively smart up to squaring the sample complexity, and show that no supervised learning algorithm can do better. For distribution-family settings, we show that relatively smart learning can be impossible or can require idiosyncratic learning approaches, and its difficulty can be non-monotone in the inclusion order on distribution families.
翻译:我们重新审视了智能PAC学习框架,该框架旨在寻找能与半监督学习器竞争的监督学习器,后者被提供了未标记数据边际分布的完整知识。先前研究表明,相对于任意固定且已知的测度,这种边际对边际的保证对"大多数"边际分布是可能实现的,但无法进一步推广。我们发现这种失败可归因于一种"不可区分性"现象:存在某些边际分布,它们与需要不同学习方法的其他边际分布在统计上无法区分。在此类场景中,半监督学习无法通过未标签数据验证其保证,使得这些保证实质上不可操作。我们提出相对智能学习这一新框架,该框架仅要求监督学习器与最佳"可验证"的半监督保证进行竞争。我们证明这种适度松弛足以规避先前工作中的不可能性结果。在无分布设定下,我们证明OIG学习器在样本复杂度平方范围内具有相对智能性,并证明任何监督学习算法都无法取得更好结果。对于分布族设定,我们证明相对智能学习可能无法实现或需要特殊的学习方法,且其难度在分布族的包含序上可能呈现非单调性。