Despite the remarkable performance of generative large language models (LLMs) on abstractive summarization, they face two significant challenges: their considerable size and tendency to hallucinate. Hallucinations are concerning because they erode reliability and raise safety issues. Pruning is a technique that reduces model size by removing redundant weights, enabling more efficient sparse inference. Pruned models yield downstream task performance comparable to the original, making them ideal alternatives when operating on a limited budget. However, the effect that pruning has upon hallucinations in abstractive summarization with LLMs has yet to be explored. In this paper, we provide an extensive empirical study across five summarization datasets, two state-of-the-art pruning methods, and five instruction-tuned LLMs. Surprisingly, we find that hallucinations from pruned LLMs are less prevalent than the original models. Our analysis suggests that pruned models tend to depend more on the source document for summary generation. This leads to a higher lexical overlap between the generated summary and the source document, which could be a reason for the reduction in hallucination risk.
翻译:尽管生成式大型语言模型在抽象摘要任务中表现卓越,但其面临两大挑战:模型规模庞大与易产生幻觉。幻觉问题尤为值得关注,因其会削弱模型可靠性并引发安全风险。剪枝技术通过移除冗余权重缩小模型规模,从而实现更高效的稀疏推理。剪枝后的模型在下游任务中可保持与原始模型相当的性能,使其成为计算资源受限场景下的理想替代方案。然而,剪枝对基于大型语言模型的抽象摘要任务中幻觉现象的影响尚未得到探索。本文通过在五个摘要数据集、两种先进剪枝方法与五个指令微调大型语言模型上开展广泛实证研究,发现出乎意料的结果:剪枝后模型的幻觉发生率低于原始模型。分析表明,剪枝模型在生成摘要时更倾向于依赖源文档,导致生成摘要与源文档之间存在更高的词汇重叠率,这可能是幻觉风险降低的原因之一。