We study a variant of Collaborative PAC Learning, in which we aim to learn an accurate classifier for each of the $n$ data distributions, while minimizing the number of samples drawn from them in total. Unlike in the usual collaborative learning setup, it is not assumed that there exists a single classifier that is simultaneously accurate for all distributions. We show that, when the data distributions satisfy a weaker realizability assumption, sample-efficient learning is still feasible. We give a learning algorithm based on Empirical Risk Minimization (ERM) on a natural augmentation of the hypothesis class, and the analysis relies on an upper bound on the VC dimension of this augmented class. In terms of the computational efficiency, we show that ERM on the augmented hypothesis class is NP-hard, which gives evidence against the existence of computationally efficient learners in general. On the positive side, for two special cases, we give learners that are both sample- and computationally-efficient.
翻译:我们研究了一种协作PAC学习的变体,旨在为每个数据分布学习一个准确的分类器,同时最小化从这些分布中抽取的总样本数。与常规协作学习设定不同,该问题不假设存在一个可同时适用于所有分布的单一分类器。研究表明,当数据分布满足较弱可实现性假设时,仍可实现样本高效学习。我们提出了一种基于经验风险最小化(ERM)的学习算法,该算法对假设类进行自然的扩充,其分析依赖于该扩充类VC维的上界。在计算效率方面,我们证明了扩充假设类上的ERM问题是NP困难的,这从一般意义上揭示了计算高效学习器可能不存在。另一方面,针对两种特殊情况,我们提出了兼具样本高效性和计算高效性的学习器。