We explore the relationship between factuality and Natural Language Inference (NLI) by introducing FactRel -- a novel annotation scheme that models \textit{factual} rather than \textit{textual} entailment, and use it to annotate a dataset of naturally occurring sentences from news articles. Our analysis shows that 84\% of factually supporting pairs and 63\% of factually undermining pairs do not amount to NLI entailment or contradiction, respectively, suggesting that factual relationships are more apt for analyzing media discourse. We experiment with models for pairwise classification on the new dataset, and find that in some cases, generating synthetic data with GPT-4 on the basis of the annotated dataset can improve performance. Surprisingly, few-shot learning with GPT-4 yields strong results on par with medium LMs (DeBERTa) trained on the labelled dataset. We hypothesize that these results indicate the fundamental dependence of this task on both world knowledge and advanced reasoning abilities.
翻译:我们通过引入FactRel——一种新颖的标注方案,该方案建模的是\textit{事实性}蕴含而非\textit{文本性}蕴含,并利用其对新闻文章中自然出现的句子进行数据集标注,从而探索事实性与自然语言推理(NLI)之间的关系。我们的分析表明,84%的事实支持对和63%的事实削弱对分别不构成NLI蕴含或矛盾,这表明事实性关系更适合用于分析媒体话语。我们在新数据集上进行了成对分类模型的实验,发现在某些情况下,基于标注数据集使用GPT-4生成合成数据可以提高性能。令人惊讶的是,GPT-4的少样本学习取得了与在标注数据集上训练的中等规模语言模型(DeBERTa)相当的良好结果。我们假设这些结果表明,该任务根本上依赖于世界知识和高级推理能力。