The rise of online platforms exacerbated the spread of hate speech, demanding scalable and effective detection. However, the accuracy of hate speech detection systems heavily relies on human-labeled data, which is inherently susceptible to biases. While previous work has examined the issue, the interplay between the characteristics of the annotator and those of the target of the hate are still unexplored. We fill this gap by leveraging an extensive dataset with rich socio-demographic information of both annotators and targets, uncovering how human biases manifest in relation to the target's attributes. Our analysis surfaces the presence of widespread biases, which we quantitatively describe and characterize based on their intensity and prevalence, revealing marked differences. Furthermore, we compare human biases with those exhibited by persona-based LLMs. Our findings indicate that while persona-based LLMs do exhibit biases, these differ significantly from those of human annotators. Overall, our work offers new and nuanced results on human biases in hate speech annotations, as well as fresh insights into the design of AI-driven hate speech detection systems.
翻译:在线平台的兴起加剧了仇恨言论的传播,亟需可扩展且有效的检测手段。然而,仇恨言论检测系统的准确性在很大程度上依赖于人工标注数据,而这些数据本身易受偏见影响。尽管已有研究探讨过此问题,但标注者特征与仇恨目标特征之间的相互作用仍未得到充分探索。本研究通过利用一个包含标注者和目标丰富社会人口学信息的大规模数据集填补了这一空白,揭示了人类偏见如何根据目标属性呈现。我们的分析揭示了广泛存在的偏见,并基于其强度与普遍性进行了定量描述和表征,显示出显著的差异。此外,我们将人类偏见与基于角色设定的LLM所表现的偏见进行了比较。研究发现,虽然基于角色设定的LLM确实存在偏见,但这些偏见与人类标注者的偏见存在显著差异。总体而言,本研究为仇恨言论标注中的人类偏见提供了新颖而细致的结论,并为AI驱动的仇恨言论检测系统设计提供了新的见解。