Large Language Models (LLMs) are proficient at retrieving single facts from extended contexts, yet they struggle with tasks requiring the simultaneous retrieval of multiple facts, especially during generation. This paper identifies a novel "lost-in-the-middle" phenomenon, where LLMs progressively lose track of critical information throughout the generation process, resulting in incomplete or inaccurate retrieval. To address this challenge, we introduce Find All Crucial Texts (FACT), an iterative retrieval method that refines context through successive rounds of rewriting. This approach enables models to capture essential facts incrementally, which are often overlooked in single-pass retrieval. Experiments demonstrate that FACT substantially enhances multi-fact retrieval performance across various tasks, though improvements are less notable in general-purpose QA scenarios. Our findings shed light on the limitations of LLMs in multi-fact retrieval and underscore the need for more resilient long-context retrieval strategies.
翻译:大型语言模型(LLMs)擅长从长上下文中检索单一事实,但在需要同时检索多个事实的任务中表现欠佳,尤其是在生成过程中。本文揭示了一种新颖的"迷失于中间"现象,即LLMs在生成过程中逐渐丢失关键信息,导致检索不完整或不准确。为应对这一挑战,我们提出了"查找所有关键文本"(FACT)方法,这是一种通过多轮重写逐步优化上下文的迭代检索方法。该方法使模型能够逐步捕获在单次检索中常被忽略的关键事实。实验表明,FACT显著提升了多种任务中的多事实检索性能,但在通用问答场景中改进相对有限。我们的研究揭示了LLMs在多事实检索中的局限性,并强调了开发更具鲁棒性的长上下文检索策略的必要性。