Large Language Models (LLMs) have significantly advanced text generation capabilities, including tasks like summarization, often producing coherent and fluent outputs. However, faithfulness to source material remains a significant challenge due to the generation of hallucinations. While extensive research focuses on detecting and reducing these inaccuracies, less attention has been paid to the positional distribution of hallucination within generated text, particularly in long outputs. In this work, we investigate where hallucinations occur in LLM-based long response generation, using long document summarization as a key case study. Focusing on the challenging setting of long context-aware long response generation, we find a consistent and concerning phenomenon: hallucinations tend to concentrate disproportionately in the latter parts of the generated long response. To understand this bias, we explore potential contributing factors related to the dynamics of attention and decoding over long sequences. Furthermore, we investigate methods to mitigate this positional hallucination, aiming to improve faithfulness specifically in the concluding segments of long outputs.
翻译:大型语言模型(LLM)显著提升了文本生成能力,在摘要等任务中常能生成连贯流畅的输出。然而,由于幻觉的产生,生成内容对源材料的忠实性仍是重大挑战。尽管已有大量研究专注于检测和减少这些不准确信息,但针对生成文本中幻觉的位置分布——尤其是在长篇幅输出中——的关注相对不足。本研究以长文档摘要为关键案例,探究基于LLM的长文本生成中幻觉出现的位置。聚焦于长上下文感知的长文本生成这一挑战性场景,我们发现了一个持续存在且值得关注的现象:幻觉倾向于不成比例地集中在生成长文本的后半部分。为理解这种偏差,我们探究了与长序列注意力机制和解码动态相关的潜在影响因素。此外,我们研究了缓解这种位置性幻觉的方法,旨在专门提升长篇幅输出结尾部分的忠实性。