Toxicity detection algorithms, originally designed with reactive content moderation in mind, are increasingly being deployed into proactive end-user interventions to moderate content. Through a socio-technical lens and focusing on contexts in which they are applied, we explore the use of these algorithms in proactive moderation systems. Placing a toxicity detection algorithm in an imagined virtual mobile keyboard, we critically explore how such algorithms could be used to proactively reduce the sending of toxic content. We present findings from design workshops conducted with four distinct stakeholder groups and find concerns around how contextual complexities may exasperate inequalities around content moderation processes. Whilst only specific user groups are likely to directly benefit from these interventions, we highlight the potential for other groups to misuse them to circumvent detection, validate and gamify hate, and manipulate algorithmic models to exasperate harm.
翻译:毒性检测算法最初是为反应性内容审核设计的,如今正越来越多地被部署到主动终端用户干预中以审核内容。通过社会技术视角并聚焦其应用场景,我们探讨了这些算法在主动审核系统中的使用。将毒性检测算法置于一个设想的虚拟移动键盘中,我们批判性地探究此类算法如何能被用于主动减少毒性内容的发送。我们展示了与四个不同利益相关群体进行的设计工作坊的研究结果,发现对于语境复杂性如何可能加剧内容审核过程中的不平等问题存在担忧。尽管只有特定用户群体可能直接受益于这些干预措施,我们指出了其他群体可能滥用它们以规避检测、验证仇恨言论并将其游戏化、以及操纵算法模型从而加剧伤害的潜在风险。