Offensive speech detection is a key component of content moderation. However, what is offensive can be highly subjective. This paper investigates how machine and human moderators disagree on what is offensive when it comes to real-world social web political discourse. We show that (1) there is extensive disagreement among the moderators (humans and machines); and (2) human and large-language-model classifiers are unable to predict how other human raters will respond, based on their political leanings. For (1), we conduct a noise audit at an unprecedented scale that combines both machine and human responses. For (2), we introduce a first-of-its-kind dataset of vicarious offense. Our noise audit reveals that moderation outcomes vary wildly across different machine moderators. Our experiments with human moderators suggest that political leanings combined with sensitive issues affect both first-person and vicarious offense. The dataset is available through https://github.com/Homan-Lab/voiced.
翻译:冒犯性言论检测是内容审核的关键组成部分。然而,何为冒犯具有高度主观性。本文研究了在现实世界社交媒体政治话语中,机器与人类审核者对何为冒犯存在的分歧。我们表明:(1) 审核者(人类与机器)之间存在广泛的分歧;(2) 人类与大型语言模型分类器无法基于其政治倾向预测其他人类评分者将如何反应。针对(1),我们以前所未有的规模进行了结合机器与人类响应的噪声审计。针对(2),我们引入了首个替代性冒犯数据集。我们的噪声审计表明,不同机器审核者的审核结果差异巨大。我们针对人类审核者的实验表明,政治倾向与敏感议题相结合会影响第一人称冒犯与替代性冒犯。该数据集可通过 https://github.com/Homan-Lab/voiced 获取。