We study a variant of Collaborative PAC Learning, in which we aim to learn an accurate classifier for each of the $n$ data distributions, while minimizing the number of samples drawn from them in total. Unlike in the usual collaborative learning setup, it is not assumed that there exists a single classifier that is simultaneously accurate for all distributions. We show that, when the data distributions satisfy a weaker realizability assumption, which appeared in [Crammer and Mansour, 2012] in the context of multi-task learning, sample-efficient learning is still feasible. We give a learning algorithm based on Empirical Risk Minimization (ERM) on a natural augmentation of the hypothesis class, and the analysis relies on an upper bound on the VC dimension of this augmented class. In terms of the computational efficiency, we show that ERM on the augmented hypothesis class is NP-hard, which gives evidence against the existence of computationally efficient learners in general. On the positive side, for two special cases, we give learners that are both sample- and computationally-efficient.
翻译:我们研究协同PAC学习的一个变体,其目标是为每个数据分布学习一个精确分类器,同时最小化从这些分布中抽取的样本总数。与通常的协同学习设置不同,本研究不假设存在一个对所有分布同时保持精确的单一分类器。我们证明,当数据分布满足[Crammer and Mansour, 2012]在多任务学习背景下提出的较弱可实现性假设时,样本高效学习仍然是可行的。我们提出一种基于经验风险最小化(ERM)的学习算法,该算法作用于假设类的自然增广形式,其分析依赖于该增广类VC维的上界。在计算效率方面,我们证明在增广假设类上的ERM是NP难问题,这为一般意义上计算高效学习器的不存在性提供了依据。从积极角度看,针对两种特殊情况,我们提出了兼具样本高效性与计算高效性的学习器。