Ev2R：评估自动事实核查中的证据检索 (Ev2R: Evaluating Evidence Retrieval in Automated Fact-Checking)

Current automated fact-checking (AFC) approaches commonly evaluate evidence either implicitly via the predicted verdicts or by comparing retrieved evidence with a predefined closed knowledge source, such as Wikipedia. However, these methods suffer from limitations, resulting from their reliance on evaluation metrics developed for different purposes and constraints imposed by closed knowledge sources. Recent advances in natural language generation (NLG) evaluation offer new possibilities for evidence assessment. In this work, we introduce Ev2R, an evaluation framework for AFC that comprises three types of approaches for evidence evaluation: reference-based, proxy-reference, and reference-less. We evaluate their effectiveness through agreement with human ratings and adversarial tests, and demonstrate that prompt-based scorers, particularly those leveraging LLMs and reference evidence, outperform traditional evaluation approaches.

翻译：当前的自动事实核查方法通常通过预测的核查结论隐式评估证据，或者通过将检索到的证据与预定义的封闭知识源（如维基百科）进行比较来评估。然而，这些方法存在局限性，源于其对为不同目的开发的评估指标的依赖以及封闭知识源施加的约束。自然语言生成评估的最新进展为证据评估提供了新的可能性。在本工作中，我们提出了Ev2R，一个用于自动事实核查的评估框架，包含三种证据评估方法：基于参考的方法、代理参考方法和无参考方法。我们通过其与人工评分的吻合度以及对抗性测试来评估它们的有效性，并证明基于提示的评分器，特别是那些利用大语言模型和参考证据的评分器，优于传统的评估方法。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日