Machine learning (ML) is widely used to moderate online content. Despite its scalability relative to human moderation, the use of ML introduces unique challenges to content moderation. One such challenge is predictive multiplicity: multiple competing models for content classification may perform equally well on average, yet assign conflicting predictions to the same content. This multiplicity can result from seemingly innocuous choices during model development, such as random seed selection for parameter initialization. We experimentally demonstrate how content moderation tools can arbitrarily classify samples as toxic, leading to arbitrary restrictions on speech. We discuss these findings in terms of human rights set out by the International Covenant on Civil and Political Rights (ICCPR), namely freedom of expression, non-discrimination, and procedural justice. We analyze (i) the extent of predictive multiplicity among state-of-the-art LLMs used for detecting toxic content; (ii) the disparate impact of this arbitrariness across social groups; and (iii) how model multiplicity compares to unambiguous human classifications. Our findings indicate that the up-scaled algorithmic moderation risks legitimizing an algorithmic leviathan, where an algorithm disproportionately manages human rights. To mitigate such risks, our study underscores the need to identify and increase the transparency of arbitrariness in content moderation applications. Since algorithmic content moderation is being fueled by pressing social concerns, such as disinformation and hate speech, our discussion on harms raises concerns relevant to policy debates. Our findings also contribute to content moderation and intermediary liability laws being discussed and passed in many countries, such as the Digital Services Act in the European Union, the Online Safety Act in the United Kingdom, and the Fake News Bill in Brazil.
翻译:机器学习(ML)被广泛用于在线内容审核。尽管相对于人工审核具有可扩展性,但机器学习的使用给内容审核带来了独特的挑战。其中一个挑战是预测多重性:多个用于内容分类的竞争模型可能在平均性能上表现同样出色,但对同一内容却产生冲突的预测。这种多重性可能源于模型开发过程中看似无关紧要的选择,例如参数初始化的随机种子选择。我们通过实验证明,内容审核工具如何将样本随意分类为有害内容,从而导致对言论的任意限制。我们依据《公民权利和政治权利国际公约》(ICCPR)中规定的人权,即言论自由、非歧视和程序正义,对上述发现进行了讨论。我们分析了:(i)用于检测有害内容的最先进大语言模型之间预测多重性的程度;(ii)这种随意性在不同社会群体中的差异化影响;以及(iii)模型多重性与明确的人类分类之间的比较。我们的研究结果表明,扩大规模的算法审核存在使算法利维坦合法化的风险,即算法不成比例地管理人权。为减轻此类风险,本研究强调需要识别并提高内容审核应用中随意性的透明度。由于算法内容审核正受到虚假信息和仇恨言论等紧迫社会问题的推动,我们关于危害的讨论引发了与政策辩论相关的关切。我们的发现还为许多国家正在讨论和通过的内容审核与中介责任法律(如欧盟的《数字服务法》、英国的《网络安全法》和巴西的《虚假新闻法案》)提供了参考。