Common crowdsourcing systems average estimates of a latent quantity of interest provided by many crowdworkers to produce a group estimate. We develop a new approach -- predict-each-worker -- that leverages self-supervised learning and a novel aggregation scheme. This approach adapts weights assigned to crowdworkers based on estimates they provided for previous quantities. When skills vary across crowdworkers or their estimates correlate, the weighted sum offers a more accurate group estimate than the average. Existing algorithms such as expectation maximization can, at least in principle, produce similarly accurate group estimates. However, their computational requirements become onerous when complex models, such as neural networks, are required to express relationships among crowdworkers. Predict-each-worker accommodates such complexity as well as many other practical challenges. We analyze the efficacy of predict-each-worker through theoretical and computational studies. Among other things, we establish asymptotic optimality as the number of engagements per crowdworker grows.
翻译:常见的众包系统通过平均众包工作者提供的潜在感兴趣量的估计值来生成群体估计。我们提出了一种新方法——预测每个工作者(predict-each-worker),该方法利用自监督学习和一种新型聚合方案。这种方法根据众包工作者先前对数量的估计值,动态调整分配给他们的权重。当众包工作者之间的技能存在差异或他们的估计值相关时,加权和比简单平均能提供更准确的群体估计。现有算法(如期望最大化)原则上也能产生类似准确的群体估计。然而,当需要复杂模型(如神经网络)来表达众包工作者之间的关系时,这些算法的计算需求变得繁重。预测每个工作者方法能够应对这种复杂性以及许多其他实际挑战。我们通过理论和计算研究分析了预测每个工作者的有效性。其中,我们证明了当每个众包工作者的参与次数增加时,该方法具有渐近最优性。