Offensive speech detection is a key component of content moderation. However, what is offensive can be highly subjective. This paper investigates how machine and human moderators disagree on what is offensive when it comes to real-world social web political discourse. We show that (1) there is extensive disagreement among the moderators (humans and machines); and (2) human and large-language-model classifiers are unable to predict how other human raters will respond, based on their political leanings. For (1), we conduct a noise audit at an unprecedented scale that combines both machine and human responses. For (2), we introduce a first-of-its-kind dataset of vicarious offense. Our noise audit reveals that moderation outcomes vary wildly across different machine moderators. Our experiments with human moderators suggest that political leanings combined with sensitive issues affect both first-person and vicarious offense. The dataset is available through https://github.com/Homan-Lab/voiced.
翻译:攻击性言论检测是内容审核的关键组成部分。然而,何为攻击性具有高度主观性。本文研究了在真实社交网络政治话语中,机器与人工审核员在何为攻击性上产生的分歧。我们表明:(1)审核员(包括人与机器)之间存在广泛分歧;(2)人类与大型语言模型分类器无法根据政治倾向预测其他人类评分者将如何反应。针对(1),我们以前所未有的规模进行噪声审计,结合了机器与人类响应。针对(2),我们引入了首个代理冒犯数据集。噪声审计揭示,不同机器审核员之间的审核结果差异极大。我们与人工审核员的实验表明,政治倾向与敏感话题的结合会影响第一人称与代理冒犯。该数据集可通过 https://github.com/Homan-Lab/voiced 获取。