The prevalence and impact of toxic discussions online have made content moderation crucial.Automated systems can play a vital role in identifying toxicity, and reducing the reliance on human moderation.Nevertheless, identifying toxic comments for diverse communities continues to present challenges that are addressed in this paper.The two-part goal of this study is to(1)identify intuitive variances from annotator disagreement using quantitative analysis and (2)model the subjectivity of these viewpoints.To achieve our goal, we published a new dataset\footnote{\url{https://github.com/XXX}} with expert annotators' annotations and used two other public datasets to identify the subjectivity of toxicity.Then leveraging the Large Language Model(LLM),we evaluate the model's ability to mimic diverse viewpoints on toxicity by varying size of the training data and utilizing same set of annotators as the test set used during model training and a separate set of annotators as the test set.We conclude that subjectivity is evident across all annotator groups, demonstrating the shortcomings of majority-rule voting. Moving forward, subjective annotations should serve as ground truth labels for training models for domains like toxicity in diverse communities.
翻译:在线有害讨论的普遍存在及其影响使得内容审核变得至关重要。自动化系统可在识别有害性方面发挥重要作用,从而减少对人类审核的依赖。然而,为不同社群识别有害评论仍面临诸多挑战,本文正是针对这些问题展开研究。本研究的双重目标是:(1)通过定量分析识别标注者分歧中的直觉性差异;(2)对这些观点的主观性进行建模。为实现这一目标,我们发布了一个包含专家标注者标注的新数据集,并利用两个公开数据集来识别有害性的主观性。随后,我们运用大语言模型,通过改变训练数据规模,并分别采用与模型训练时相同的标注者集合及另一组独立标注者集合作为测试集,评估模型模拟不同有害性观点的能力。研究结论表明,所有标注者群体均存在显著的主观性,这揭示了多数投票机制的局限性。展望未来,在针对跨社群有害性等领域的模型训练中,主观性标注应作为真实标签来使用。