Exploring Factual Entailment with NLI: A News Media Study

We explore the relationship between factuality and Natural Language Inference (NLI) by introducing FactRel -- a novel annotation scheme that models \textit{factual} rather than \textit{textual} entailment, and use it to annotate a dataset of naturally occurring sentences from news articles. Our analysis shows that 84\% of factually supporting pairs and 63\% of factually undermining pairs do not amount to NLI entailment or contradiction, respectively, suggesting that factual relationships are more apt for analyzing media discourse. We experiment with models for pairwise classification on the new dataset, and find that in some cases, generating synthetic data with GPT-4 on the basis of the annotated dataset can improve performance. Surprisingly, few-shot learning with GPT-4 yields strong results on par with medium LMs (DeBERTa) trained on the labelled dataset. We hypothesize that these results indicate the fundamental dependence of this task on both world knowledge and advanced reasoning abilities.

翻译：我们通过引入FactRel——一种新颖的标注方案，该方案建模的是\textit{事实性}蕴含而非\textit{文本性}蕴含，并利用其对新闻文章中自然出现的句子进行数据集标注，从而探索事实性与自然语言推理（NLI）之间的关系。我们的分析表明，84%的事实支持对和63%的事实削弱对分别不构成NLI蕴含或矛盾，这表明事实性关系更适合用于分析媒体话语。我们在新数据集上进行了成对分类模型的实验，发现在某些情况下，基于标注数据集使用GPT-4生成合成数据可以提高性能。令人惊讶的是，GPT-4的少样本学习取得了与在标注数据集上训练的中等规模语言模型（DeBERTa）相当的良好结果。我们假设这些结果表明，该任务根本上依赖于世界知识和高级推理能力。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日