Hate speech (HS) is a critical issue in online discourse, and one promising strategy to counter it is through the use of counter-narratives (CNs). Datasets linking HS with CNs are essential for advancing counterspeech research. However, even flagship resources like CONAN (Chung et al., 2019) annotate only a sparse subset of all possible HS-CN pairs, limiting evaluation. We introduce FC-CONAN (Fully Connected CONAN), the first dataset created by exhaustively considering all combinations of 45 English HS messages and 129 CNs. A two-stage annotation process involving nine annotators and four validators produces four partitions-Diamond, Gold, Silver, and Bronze-that balance reliability and scale. None of the labeled pairs overlap with CONAN, uncovering hundreds of previously unlabelled positives. FC-CONAN enables more faithful evaluation of counterspeech retrieval systems and facilitates detailed error analysis. The dataset is publicly available.
翻译:仇恨言论是网络话语中的一个关键问题,而使用反叙事来对抗它是一项有前景的策略。将仇恨言论与反叙事关联起来的数据集对于推进反言论研究至关重要。然而,即使是像CONAN这样的旗舰资源,也仅标注了所有可能仇恨言论-反叙事配对中的一个稀疏子集,这限制了评估。我们引入了FC-CONAN,这是首个通过穷尽考虑45条英文仇恨言论消息与129条反叙事的所有组合而创建的数据集。一个涉及九名标注员和四名验证者的两阶段标注过程产生了四个分区——钻石、黄金、白银和青铜——以平衡可靠性与规模。所有标注的配对均不与CONAN重叠,从而揭示了数百个先前未标注的正例。FC-CONAN能够更真实地评估反言论检索系统,并促进详细的错误分析。该数据集已公开可用。