Factual inconsistencies pose a significant hurdle for the faithful summarization by generative models. While a major direction to enhance inconsistency detection is to derive stronger Natural Language Inference (NLI) models, we propose an orthogonal aspect that underscores the importance of incorporating task-specific taxonomy into the inference. To this end, we consolidate key error types of inconsistent facts in summaries, and incorporate them to facilitate both the zero-shot and supervised paradigms of LLMs. Extensive experiments on ten datasets of five distinct domains suggest that, zero-shot LLM inference could benefit from the explicit solution space depicted by the error type taxonomy, and achieves state-of-the-art performance overall, surpassing specialized non-LLM baselines, as well as recent LLM baselines. We further distill models that fuse the taxonomy into parameters through our designed prompt completions and supervised training strategies, efficiently substituting state-of-the-art zero-shot inference with much larger LLMs.
翻译:事实不一致性是生成模型实现忠实摘要生成面临的主要障碍。尽管提升不一致性检测能力的主要方向是构建更强大的自然语言推理(NLI)模型,本文提出一个正交视角,强调将任务特定分类法融入推理过程的重要性。为此,我们系统整合了摘要中事实不一致性的关键错误类型,并将其纳入大型语言模型(LLM)的零样本与监督学习范式。在五个不同领域的十个数据集上的大量实验表明:零样本LLM推理能够受益于错误类型分类法所描述的显式解空间,整体达到最先进的性能水平,超越了专业化的非LLM基线模型以及近期的LLM基线模型。我们进一步通过设计的提示补全与监督训练策略,将分类法知识蒸馏至模型参数中,实现了以更高效的小型模型替代参数量更大的零样本推理模型的目标。