Online hate speech is associated with substantial social harms, yet it remains unclear how consistently platforms enforce hate speech policies or whether enforcement is feasible at scale. We address these questions through a global audit of hate speech moderation on Twitter (now X). Using a complete 24-hour snapshot of public tweets, we construct representative samples comprising 540,000 tweets annotated for hate speech by trained annotators across eight major languages. Five months after posting, 80% of hateful tweets remain online, including explicitly violent hate speech. Such tweets are no more likely to be removed than non-hateful tweets, with neither severity nor visibility increasing the likelihood of removal. We then examine whether these enforcement gaps reflect technical limits of large-scale moderation systems. While fully automated detection systems cannot reliably identify hate speech without generating large numbers of false positives, they effectively prioritize likely violations for human review. Simulations of a human-AI moderation pipeline indicate that substantially reducing user exposure to hate speech is economically feasible at a cost below existing regulatory penalties. These results suggest that the persistence of online hate cannot be explained by technical constraints alone but also reflects institutional choices in the allocation of moderation resources.
翻译:在线仇恨言论与重大社会危害相关,但平台执行仇恨言论政策的一致性及其大规模可行性仍不明确。我们通过对推特(现X平台)的全球审计来回答这些问题。利用24小时内的公开推文完整快照,我们构建了包含54万条推文的代表性样本,这些推文由经过培训的标注员使用八种主要语言进行仇恨言论标注。在发布五个月后,80%的仇恨推文仍在线保留,包括明确涉及暴力的仇恨言论。此类推文被删除的概率并不高于非仇恨推文,严重程度和可见性均未提高删除率。我们进一步考察了这些执行差距是否反映大规模审核系统的技术局限性。虽然完全自动化的检测系统无法在不产生大量误报的情况下可靠识别仇恨言论,但它们能有效将疑似违规内容优先提交人工审核。人机协同审核流程的模拟表明,大幅降低用户对仇恨言论的接触在现有监管处罚成本范围内具有经济可行性。这些结果表明,在线仇恨言论的持续存在不能仅用技术限制解释,更反映了审核资源配置中的制度性选择。