Counter-speech generation is at the core of many expert activities, such as fact-checking and hate speech, to counter harmful content. Yet, existing work treats counter-speech generation as pure text generation task, mainly based on Large Language Models or NGO experts. These approaches show severe drawbacks due to the limited reliability and coherence in the generated countering text, and in scalability, respectively. To close this gap, we introduce a novel framework to model counter-speech generation as knowledge-wise text generation process. Our framework integrates advanced Retrieval-Augmented Generation (RAG) pipelines to ensure the generation of trustworthy counter-speech for 8 main target groups identified in the hate speech literature, including women, people of colour, persons with disabilities, migrants, Muslims, Jews, LGBT persons, and other. We built a knowledge base over the United Nations Digital Library, EUR-Lex and the EU Agency for Fundamental Rights, comprising a total of 32,792 texts. We use the MultiTarget-CONAN dataset to empirically assess the quality of the generated counter-speech, both through standard metrics (i.e., JudgeLM) and a human evaluation. Results show that our framework outperforms standard LLM baselines and competitive approach, on both assessments. The resulting framework and the knowledge base pave the way for studying trustworthy and sound counter-speech generation, in hate speech and beyond.
翻译:反言论生成是事实核查与仇恨言论应对等专业活动的核心任务,旨在对抗有害内容。然而,现有研究将反言论生成视为纯文本生成任务,主要依赖大型语言模型或非政府组织专家。这些方法因生成的反驳文本可靠性、连贯性有限,或可扩展性不足而存在明显缺陷。为弥补这一差距,我们提出一种新颖框架,将反言论生成建模为知识引导的文本生成过程。该框架集成先进的检索增强生成(RAG)流程,确保为仇恨言论文献中识别的八大目标群体(包括女性、有色人种、残障人士、移民、穆斯林、犹太人、LGBT群体及其他)生成可信的反言论。我们基于联合国数字图书馆、EUR-Lex及欧盟基本权利署构建了包含32,792篇文本的知识库。通过MultiTarget-CONAN数据集,我们采用标准指标(如JudgeLM)与人工评估对生成的反言论质量进行实证检验。结果表明,我们的框架在两项评估中均优于标准LLM基线及竞争性方法。该框架与知识库为研究仇恨言论及其他领域可信且合理的反言论生成开辟了新路径。