The increasing volume of hate speech on online platforms poses significant societal challenges. While the Natural Language Processing community has developed effective methods to automatically detect the presence of hate speech, responses to it, called counter-speech, are still an open challenge. We present PEACE 2.0, a novel tool that, besides analysing and explaining why a message is considered hateful or not, also generates a response to it. More specifically, PEACE 2.0 has three main new functionalities: leveraging a Retrieval-Augmented Generation (RAG) pipeline i) to ground HS explanations into evidence and facts, ii) to automatically generate evidence-grounded counter-speech, and iii) exploring the characteristics of counter-speech replies. By integrating these capabilities, PEACE 2.0 enables in-depth analysis and response generation for both explicit and implicit hateful messages.
翻译:在线平台上日益增长的仇恨言论带来了重大的社会挑战。尽管自然语言处理领域已开发出有效方法来自动检测仇恨言论的存在,但对其的回应——即所谓的反制言论——仍是一个开放的难题。本文提出PEACE 2.0,这是一种新颖的工具,除了分析和解释为何一条信息被视为仇恨言论或非仇恨言论外,还能生成针对它的回应。具体而言,PEACE 2.0具备三项主要新功能:利用检索增强生成(RAG)流程,i) 将仇恨言论解释基于证据和事实,ii) 自动生成基于证据的反制言论,以及 iii) 探索反制言论回复的特征。通过整合这些能力,PEACE 2.0能够对显性和隐性的仇恨信息进行深度分析与回应生成。