User-generated replies to hate speech are promising means to combat hatred, but questions about whether they can stop incivility in follow-up conversations linger. We argue that effective replies stop incivility from emerging in follow-up conversations - replies that elicit more incivility are counterproductive. This study introduces the task of predicting the incivility of conversations following replies to hate speech. We first propose a metric to measure conversation incivility based on the number of civil and uncivil comments as well as the unique authors involved in the discourse. Our metric approximates human judgments more accurately than previous metrics. We then use the metric to evaluate the outcomes of replies to hate speech. A linguistic analysis uncovers the differences in the language of replies that elicit follow-up conversations with high and low incivility. Experimental results show that forecasting incivility is challenging. We close with a qualitative analysis shedding light into the most common errors made by the best model.
翻译:用户生成的针对仇恨言论的回复是抗击仇恨的有效手段,但关于这些回复能否阻止后续对话中的不文明行为仍存疑问。我们论证,有效的回复应能阻止后续对话中出现不文明现象——反而引发更多不文明行为的回复适得其反。本研究提出了一项新任务:预测针对仇恨言论的回复所引发后续对话的不文明程度。我们首先基于对话中的文明与不文明评论数量以及参与对话的独立作者数量,提出一种衡量对话不文明度的指标。该指标比以往指标更精准地逼近人类判断。随后,我们利用该指标评估针对仇恨言论的回复效果。语言分析揭示了引发高不文明度与低不文明度后续对话的回复在语言特征上的差异。实验结果表明,预测不文明度具有挑战性。最后,我们通过定性分析揭示了最优模型最常见的错误类型。