Longstanding data labeling practices in machine learning involve collecting and aggregating labels from multiple annotators. But what should we do when annotators disagree? Though annotator disagreement has long been seen as a problem to minimize, new perspectivist approaches challenge this assumption by treating disagreement as a valuable source of information. In this position paper, we examine practices and assumptions surrounding the causes of disagreement--some challenged by perspectivist approaches, and some that remain to be addressed--as well as practical and normative challenges for work operating under these assumptions. We conclude with recommendations for the data labeling pipeline and avenues for future research engaging with subjectivity and disagreement.
翻译:机器学习中长期以来的数据标注实践涉及收集和整合来自多个标注者的标签。然而,当标注者之间出现分歧时,我们应如何处理?尽管标注者分歧长期以来被视为需要最小化的问题,但新的视角主义方法通过将分歧视为有价值的信息来源,对这一假设提出了挑战。在这篇立场论文中,我们审视了围绕分歧成因的实践与假设——其中一些受到视角主义方法的质疑,另一些仍有待解决——以及在基于这些假设下开展工作时所面临的实践与规范层面的挑战。最后,我们为数据标注流程提出了建议,并探讨了未来研究涉及主观性与分歧的方向。