Counter narratives - informed responses to hate speech contexts designed to refute hateful claims and de-escalate encounters - have emerged as an effective hate speech intervention strategy. While previous work has proposed automatic counter narrative generation methods to aid manual interventions, the evaluation of these approaches remains underdeveloped. Previous automatic metrics for counter narrative evaluation lack alignment with human judgment as they rely on superficial reference comparisons instead of incorporating key aspects of counter narrative quality as evaluation criteria. To address prior evaluation limitations, we propose a novel evaluation framework prompting LLMs to provide scores and feedback for generated counter narrative candidates using 5 defined aspects derived from guidelines from counter narrative specialized NGOs. We found that LLM evaluators achieve strong alignment to human-annotated scores and feedback and outperform alternative metrics, indicating their potential as multi-aspect, reference-free and interpretable evaluators for counter narrative evaluation.
翻译:反叙事——针对仇恨言论情境作出的有依据回应,旨在驳斥仇恨言论并缓和冲突——已成为一种有效的仇恨言论干预策略。虽然已有研究提出自动生成反叙事的方法以辅助人工干预,但这些方法的评估体系仍不完善。现有反叙事自动评估指标依赖表层参考比较而非将反叙事质量的关键维度纳入评价标准,导致与人类判断存在偏差。为解决当前评估局限,我们提出一种新型评估框架,通过引导大语言模型依据反叙事专业非政府组织指南中提炼的五个维度,对生成的反叙事候选方案进行评分与反馈。研究发现,大语言模型评估者能够与人工标注的评分和反馈实现高度对齐,且性能优于其他评估指标,表明其作为多维、免参考、可解释的反叙事评估工具的潜力。