Toxicity detection algorithms, originally designed with reactive content moderation in mind, are increasingly being deployed into proactive end-user interventions to moderate content. Through a socio-technical lens and focusing on contexts in which they are applied, we explore the use of these algorithms in proactive moderation systems. Placing a toxicity detection algorithm in an imagined virtual mobile keyboard, we critically explore how such algorithms could be used to proactively reduce the sending of toxic content. We present findings from design workshops conducted with four distinct stakeholder groups and find concerns around how contextual complexities may exasperate inequalities around content moderation processes. Whilst only specific user groups are likely to directly benefit from these interventions, we highlight the potential for other groups to misuse them to circumvent detection, validate and gamify hate, and manipulate algorithmic models to exasperate harm.
翻译:毒性检测算法最初是为反应式内容审核而设计的,如今正越来越多地被部署到主动式终端用户干预中,用于内容审核。我们从社会技术视角出发,聚焦其应用场景,探讨这些算法在主动审核系统中的使用。通过将一种毒性检测算法置于一个虚构的虚拟手机键盘中,我们批判性地审视了这类算法如何被用于主动减少毒性内容的发送。我们展示了与四个不同利益相关者群体进行设计研讨会的发现,并揭示了关于情境复杂性可能加剧内容审核过程中的不平等现象的担忧。尽管只有特定的用户群体可能直接从这些干预措施中获益,但我们强调其他群体可能滥用以规避检测、验证和游戏化仇恨行为,以及操纵算法模型以加剧危害的潜在风险。