Counterspeech offers direct rebuttals to hateful speech by challenging perpetrators of hate and showing support to targets of abuse. It provides a promising alternative to more contentious measures, such as content moderation and deplatforming, by contributing a greater amount of positive online speech rather than attempting to mitigate harmful content through removal. Advances in the development of large language models mean that the process of producing counterspeech could be made more efficient by automating its generation, which would enable large-scale online campaigns. However, we currently lack a systematic understanding of several important factors relating to the efficacy of counterspeech for hate mitigation, such as which types of counterspeech are most effective, what are the optimal conditions for implementation, and which specific effects of hate it can best ameliorate. This paper aims to fill this gap by systematically reviewing counterspeech research in the social sciences and comparing methodologies and findings with computer science efforts in automatic counterspeech generation. By taking this multi-disciplinary view, we identify promising future directions in both fields.
翻译:反制言论通过直接驳斥仇恨言论,挑战仇恨实施者并表达对受虐者的支持,为内容审核与去平台化等更具争议性的措施提供了有前景的替代方案——它通过增加积极言论的数量,而非试图通过删除内容来减轻有害信息。大语言模型的发展进步意味着,通过自动化生成反制言论可提升其制作效率,从而支撑大规模在线活动。然而,目前我们仍缺乏对反制言论在减轻仇恨效果相关若干重要因素的系统性认知,例如:哪些类型的反制言论最为有效?实施的最优条件是什么?它能最有效改善仇恨的哪些具体影响?本文旨在通过系统梳理社会科学领域的反制言论研究,并将其研究方法与计算机科学领域的自动反制言论生成成果进行对比,从而填补这一空白。通过这种多学科视角,我们为两个领域指明了未来具有前景的研究方向。