Identifying Factual Inconsistency in Summaries: Towards Effective Utilization of Large Language Model

Factual inconsistency poses a significant hurdle for the commercial deployment of abstractive summarizers. Under this Large Language Model (LLM) era, this work focuses around two important questions: what is the best way to leverage LLM for factual inconsistency detection, and how could we distill a smaller LLM with both high efficiency and efficacy? Three zero-shot paradigms are firstly proposed and evaluated across five diverse datasets: direct inference on the entire summary or each summary window; entity verification through question generation and answering. Experiments suggest that LLM itself is capable to resolve this task train-free under the proper paradigm design, surpassing strong trained baselines by 2.8% on average. To further promote practical utility, we then propose training strategies aimed at distilling smaller open-source LLM that learns to score the entire summary at once with high accuracy, which outperforms the zero-shot approaches by much larger LLM, serving as an effective and efficient ready-to-use scorer.

翻译：事实不一致性对抽象式摘要生成器的商业部署构成了重大障碍。在大语言模型时代，本研究聚焦于两个重要问题：如何最佳利用大语言模型检测事实不一致性，以及如何蒸馏出一个兼具高效率与高效能的小型大语言模型？我们首先提出三种零样本范式，并在五个多样化数据集上进行评估：直接推理整个摘要或每个摘要窗口；通过问题生成与回答进行实体验证。实验表明，在恰当的范式设计下，大语言模型本身能够在无需训练的情况下解决此任务，平均超过强训练基线2.8%。为进一步提升实用性，我们随后提出训练策略，旨在蒸馏出能够一次性高精度评分整个摘要的小型开源大语言模型，其性能优于采用更大大语言模型的零样本方法，成为高效且有效的即用型评分器。

相关内容

大语言模型

关注 67

大语言模型是基于海量文本数据训练的深度学习模型。它不仅能够生成自然语言文本，还能够深入理解文本含义，处理各种自然语言任务，如文本摘要、问答、翻译等。2023年，大语言模型及其在人工智能领域的应用已成为全球科技研究的热点，其在规模上的增长尤为引人注目，参数量已从最初的十几亿跃升到如今的一万亿。参数量的提升使得模型能够更加精细地捕捉人类语言微妙之处，更加深入地理解人类语言的复杂性。在过去的一年里，大语言模型在吸纳新知识、分解复杂任务以及图文对齐等多方面都有显著提升。随着技术的不断成熟，它将不断拓展其应用范围，为人类提供更加智能化和个性化的服务，进一步改善人们的生活和生产方式。

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

14+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日