Despite large language models (LLMs) have demonstrated impressive performance in various tasks, they are still suffering from the factual inconsistency problem called hallucinations. For instance, LLMs occasionally generate content that diverges from source article, and prefer to extract information that appears at the beginning and end of the context, especially in long document summarization. Inspired by these findings, we propose to improve the faithfulness of LLMs in summarization by impelling them to process the entire article more fairly and faithfully. We present a novel summary generation strategy, namely SliSum, which exploits the ideas of sliding windows and self-consistency. Specifically, SliSum divides the source article into overlapping windows, and utilizes LLM to generate local summaries for the content in the windows. Finally, SliSum aggregates all local summaries using clustering and majority voting algorithm to produce more faithful summary of entire article. Extensive experiments demonstrate that SliSum significantly improves the faithfulness of diverse LLMs including LLaMA-2, Claude-2 and GPT-3.5 in both short and long text summarization, while maintaining their fluency and informativeness and without additional fine-tuning and resources. We further conduct qualitative and quantitative studies to investigate why SliSum works and impacts of hyperparameters in SliSum on performance.
翻译:尽管大语言模型(LLMs)在各种任务中展现出卓越的性能,它们仍然受困于事实不一致的问题,即所谓的“幻觉”。例如,LLMs 偶尔会生成偏离源文章的内容,并且倾向于提取出现在上下文开头和结尾的信息,尤其是在长文档摘要任务中。受这些发现的启发,我们提出通过促使LLMs更公平、更忠实地处理整篇文章,来提升其在摘要生成中的忠实度。我们提出了一种新颖的摘要生成策略,称为SliSum,它利用了滑动窗口和自一致性的思想。具体而言,SliSum将源文章划分为重叠的窗口,并利用LLM为每个窗口内的内容生成局部摘要。最后,SliSum通过聚类和多数投票算法聚合所有局部摘要,以生成对整个文章更忠实的总摘要。大量实验表明,SliSum显著提升了包括LLaMA-2、Claude-2和GPT-3.5在内的多种LLMs在短文本和长文本摘要任务中的忠实度,同时保持了其流畅性和信息量,且无需额外的微调和资源。我们进一步进行了定性和定量研究,以探究SliSum为何有效,以及SliSum中超参数对性能的影响。