For the purpose of efficient and cost-effective large-scale data labeling, crowdsourcing is increasingly being utilized. To guarantee the quality of data labeling, multiple annotations need to be collected for each data sample, and truth inference algorithms have been developed to accurately infer the true labels. Despite previous studies having released public datasets to evaluate the efficacy of truth inference algorithms, these have typically focused on a single type of crowdsourcing task and neglected the temporal information associated with workers' annotation activities. These limitations significantly restrict the practical applicability of these algorithms, particularly in the context of long-term and online truth inference. In this paper, we introduce a substantial crowdsourcing annotation dataset collected from a real-world crowdsourcing platform. This dataset comprises approximately two thousand workers, one million tasks, and six million annotations. The data was gathered over a period of approximately six months from various types of tasks, and the timestamps of each annotation were preserved. We analyze the characteristics of the dataset from multiple perspectives and evaluate the effectiveness of several representative truth inference algorithms on this dataset. We anticipate that this dataset will stimulate future research on tracking workers' abilities over time in relation to different types of tasks, as well as enhancing online truth inference.
翻译:为高效且经济地进行大规模数据标注,众包方式正日益广泛应用。为保证数据标注质量,需为每个数据样本收集多次标注,并开发真值推断算法以准确推断真实标签。尽管先前研究已公开验证真值推断算法有效性的数据集,但这些数据集通常仅关注单一类型的众包任务,且忽略了标注人员工作活动的时间信息。这些局限显著限制了算法在实际场景中的适用性,尤其是在长期与在线真值推断场景中。本文引入了一个从真实众包平台收集的大型众包标注数据集。该数据集涵盖约两千名标注人员、一百万项任务及六百万条标注。数据采集历时约六个月,涉及多种任务类型,并保留了每条标注的时间戳。我们从多角度分析数据集特征,并在该数据集上评估了若干代表性真值推断算法的有效性。我们期望此数据集能推动针对标注人员能力随时间变化与不同类型任务关联性的研究,并促进在线真值推断技术的提升。