Counterspeech has gained attention as a strategy to reduce hate speech on social media. Although previous studies suggest that counterspeech can reduce hate speech, little is known about its effects on participation in online hate communities. Relatedly, we lack an understanding about the degree of hostility in counterspeech. Hostile counterspeech may increase online conflict, potentially hardening the positions of hate adherents, and further eroding online environments. Here, we analyzed the effect of counterspeech on 16,513 newcomers across 104 hate subreddits (forums within Reddit.com). We devised an LLM-based counterspeech detection approach that outperforms specialized models trained on existing datasets, then examined the presence, and effects of, hostility. While counterspeech comments are less toxic than hate speech comments, they are almost twice as toxic as other discourse within hate subreddits. We then evaluated the effect of counterspeech on newcomer engagement in hate subreddits. We found that newcomers using hate speech who receive counterspeech are less likely to continue posting within these hate subreddits, rather than becoming galvanized. We speculate that, instead of constituting ardent hate adherents, readily-dissuaded newcomers may merely be toying with beliefs that are proscribed in other contexts. Although we found no association between the toxicity of counterspeech and its effects on user retention, consistent with prior research regarding the harmful effects of toxic speech, we found that toxic counterspeech increases the probability of continued hostility from hate users within the same discussion.
翻译:暂无翻译