How can citizens address hate in online discourse? We analyze a large corpus of more than 130,000 discussions on Twitter over four years. With the help of human annotators, language models and machine learning classifiers, we identify different dimensions of discourse that might be related to the probability of hate speech in subsequent tweets. We use a matching approach and longitudinal statistical analyses to discern the effectiveness of different counter speech strategies on the micro-level (individual tweet pairs), meso-level (discussion trees) and macro-level (days) of discourse. We find that expressing simple opinions, not necessarily supported by facts, but without insults, relates to the least hate in subsequent discussions. Sarcasm can be helpful as well, in particular in the presence of organized extreme groups. Mentioning either outgroups or ingroups is typically related to a deterioration of discourse. A pronounced emotional tone, either negative such as anger or fear, or positive such as enthusiasm and pride, also leads to worse discourse quality. We obtain similar results for other measures of quality of discourse beyond hate speech, including toxicity, extremity of speech, and the presence of extreme speakers. Going beyond one-shot analyses on smaller samples of discourse, our findings have implications for the successful management of online commons through collective civic moderation.
翻译:公民如何应对在线话语中的仇恨言论?我们分析了推特上四年间超过13万场讨论的大规模语料库。借助人工标注者、语言模型与机器学习分类器,我们识别了可能影响后续推文中仇恨言论概率的不同话语维度。采用配对方法及纵向统计分析法,我们从微观(单条推文对)、中观(讨论树结构)与宏观(天数)三个话语层面,甄别不同反制话语策略的有效性。研究发现:表达简单观点(无需事实支撑但无侮辱性措辞)与后续讨论中最低的仇恨度相关;讽刺性表达对抑制极端群体组织化具有特殊效力;提及群体外或群体内成员通常会导致话语质量下降。明显的情感基调——无论是愤怒、恐惧等消极情绪,还是热情、自豪等积极情绪——均会降低话语质量。除仇恨言论外,类似结果也适用于其他话语质量指标(如有害性、极端化程度及极端言论者比例)。本研究突破小样本单次分析的局限,通过集体公民调节机制为在线公共空间的有效治理提供了实践启示。