A content-moderation system can score well on every standard accuracy metric and still cause real harm, if its mistakes fall on the few users who connect otherwise separate communities. We show this in an agent-based model where N=240 learning agents on a community-structured network each post harmless, productive, or dangerous content, and a regulator removes or penalizes whatever a noisy classifier flags. Overall usefulness barely moves as the noise changes (one-way ANOVA, p=0.96): by aggregate measures, nothing looks wrong. The damage instead concentrates on these bridge users, whose useful posts are wrongly suppressed and whose dangerous posts are wrongly spared. A governance loss (L_gov) that prices these two mistakes separately from the cost of enforcement more than doubles under false-positive-heavy noise. Aggregate accuracy hides who is harmed, and the cheap quantity to audit is how many connections a user has (degree), a near-perfect proxy for the betweenness that defines a bridge (r=0.96).
翻译:一种内容审核系统可能在所有标准准确性指标上表现优异,但仍会造成实质性伤害——前提是其错误集中于少数连接原本孤立社区的"桥接用户"。我们通过一个基于Agent的模型验证了这一点:在包含N=240个学习Agent的社区结构网络中,每个Agent发布无害、有益或危险三类内容,监管者依据含噪分类器的标记移除或惩罚相关帖子。从宏观效用指标看,噪声类型的变化几乎未产生任何影响(单因素方差分析,p=0.96):聚合维度下一切看似正常。但伤害实际集中作用于这些桥接用户——其有益帖子被错误压制,危险帖子却被错误豁免。通过将这两类错误与执法成本分别计量的治理损失函数(L_gov)显示,在假阳率主导的噪声环境下,该损失值超过两倍。聚合准确性指标掩盖了受伤害主体,而最便于审计的廉价指标是用户的连接数(度数),这一指标近乎完美地替代了定义桥接用户的介数中心性(r=0.96)。