We study the problem of crowdsourced PAC learning of threshold functions. This is a challenging problem and only recently have query-efficient algorithms been established under the assumption that a noticeable fraction of the workers are perfect. In this work, we investigate a more challenging case where the majority may behave adversarially and the rest behave as the Massart noise - a significant generalization of the perfectness assumption. We show that under the {semi-verified model} of Charikar et al. (2017), where we have (limited) access to a trusted oracle who always returns correct annotations, it is possible to PAC learn the underlying hypothesis class with a manageable amount of label queries. Moreover, we show that the labeling cost can be drastically mitigated via the more easily obtained comparison queries. Orthogonal to recent developments in semi-verified or list-decodable learning that crucially rely on data distributional assumptions, our PAC guarantee holds by exploring the wisdom of the crowd.
翻译:我们研究众包环境下阈值函数的PAC学习问题。这是一个具有挑战性的问题,直到最近,在假设相当比例的工作者完美无缺的条件下,才建立了查询高效的算法。在本工作中,我们探讨了一个更具挑战性的情形:多数工作者可能表现出对抗性行为,而其余工作者则遵循马萨特噪声——这是对完美假设的重要推广。我们证明,在Charikar等人(2017年)提出的{半验证模型}下,即我们(有限地)能够访问一个始终返回正确标注的可信预言机时,通过合理数量的标签查询即可PAC学习潜在的假设类。此外,我们表明,通过更易获取的比较查询,标签成本可大幅降低。与近期半验证或列表可解码学习中严重依赖数据分布假设的进展不同,我们的PAC保证通过探索众包群体的智慧得以实现。