Real-world data for classification is often labeled by multiple annotators. For analyzing such data, we introduce CROWDLAB, a straightforward approach to utilize any trained classifier to estimate: (1) A consensus label for each example that aggregates the available annotations; (2) A confidence score for how likely each consensus label is correct; (3) A rating for each annotator quantifying the overall correctness of their labels. Existing algorithms to estimate related quantities in crowdsourcing often rely on sophisticated generative models with iterative inference. CROWDLAB instead uses a straightforward weighted ensemble. Existing algorithms often rely solely on annotator statistics, ignoring the features of the examples from which the annotations derive. CROWDLAB utilizes any classifier model trained on these features, and can thus better generalize between examples with similar features. On real-world multi-annotator image data, our proposed method provides superior estimates for (1)-(3) than existing algorithms like Dawid-Skene/GLAD.
翻译:现实世界中的分类数据常由多位标注者进行标注。针对此类数据分析,我们提出CROWDLAB方法——一种利用任意训练分类器来估计如下指标的简洁方案:(1) 每个样本聚合现有标注的共识标签;(2) 各共识标签正确可能性的置信度分数;(3) 量化每位标注者整体标注正确性的评分。现有用于众包场景中估算相关量的算法通常依赖具有迭代推断过程的复杂生成模型,而CROWDLAB采用简单的加权集成方法。现有算法往往仅基于标注者统计量,忽略了标注所依托的样本特征。CROWDLAB利用基于这些特征训练的任何分类器模型,从而能在具有相似特征的样本间实现更好的泛化。在真实世界多标注者图像数据上,我们提出的方法在(1)-(3)项指标的估计中均优于Dawid-Skene/GLAD等现有算法。