Quality control is a crux of crowdsourcing. While most means for quality control are organizational and imply worker selection, golden tasks, and post-acceptance, computational quality control techniques allow parameterizing the whole crowdsourcing process of workers, tasks, and labels, inferring and revealing relationships between them. In this paper, we present Crowd-Kit, a general-purpose crowdsourcing computational quality control toolkit. It provides efficient implementations in Python of computational quality control algorithms for crowdsourcing, including data quality estimators and truth inference methods. We focus on aggregation methods for all the major annotation tasks, from the categorical annotation in which latent label assumption is met to more complex tasks like image and sequence aggregation. We perform an extensive evaluation of our toolkit on several datasets of different natures, enabling benchmarking computational quality control methods in a uniform, systematic, and reproducible way using the same codebase. We release our code and data under an open-source license at https://github.com/Toloka/crowd-kit.
翻译:质量控制是众包的关键环节。虽然大多数质量控制手段属于组织层面,涉及工人筛选、黄金任务设置和后验收机制,但计算型质量控制技术能够对整个众包过程(包括工人、任务和标签)进行参数化建模,推断并揭示它们之间的关联关系。本文提出了Crowd-Kit——一款通用型众包计算质量控制工具包。该工具包在Python中实现了众包计算质量控制算法的高效实现,涵盖数据质量评估器与真值推断方法。我们聚焦于所有主要标注任务的聚合方法:涵盖满足潜在标签假设的分类标注,以及序列聚合、图像聚合等更复杂的任务。通过在不同类型的数据集上进行全面评估,我们实现了基于统一代码库对计算质量控制方法进行标准化、系统性、可重复的基准测试。我们已在https://github.com/Toloka/crowd-kit 以开源协议发布代码与数据。