Resolving disagreement in manual annotation typically consists of removing unreliable annotators and using a label aggregation strategy such as majority vote or expert opinion to resolve disagreement. These may have the side-effect of silencing or under-representing minority but equally valid opinions. In this paper, we study the impact of standard label aggregation strategies on minority opinion representation in sexism detection. We investigate the quality and value of minority annotations, and then examine their effect on the class distributions in gold labels, as well as how this affects the behaviour of models trained on the resulting datasets. Finally, we discuss the potential biases introduced by each method and how they can be amplified by the models.
翻译:解决人工标注中的分歧通常包括剔除不可靠的标注者,并采用多数投票或专家意见等标签聚合策略来消解分歧。这些做法可能产生压制或低估少数派观点——即便其同样有效——的副作用。本文研究了在性别歧视检测任务中,标准标签聚合策略对少数派观点表征的影响。我们首先探究了少数派标注的质量与价值,进而分析其对黄金标签中类别分布的影响,以及这种影响如何作用于基于所得数据集训练的模型行为。最后,我们讨论了不同方法可能引入的偏差,以及这些偏差如何被模型进一步放大。