When completing knowledge-intensive tasks, humans sometimes need not just an answer but also a corresponding reference passage for auxiliary reading. Previous methods required obtaining pre-segmented article chunks through additional retrieval models. This paper explores leveraging the parameterized knowledge stored during the pre-training phase of large language models (LLMs) to independently recall reference passage from any starting position. We propose a two-stage framework that simulates the scenario of humans recalling easily forgotten references. Initially, the LLM is prompted to recall document title identifiers to obtain a coarse-grained document set. Then, based on the acquired coarse-grained document set, it recalls fine-grained passage. In the two-stage recall process, we use constrained decoding to ensure that content outside of the stored documents is not generated. To increase speed, we only recall a short prefix in the second stage, then locate its position to retrieve a complete passage. Experiments on KILT knowledge-sensitive tasks have verified that LLMs can independently recall reference passage location in various task forms, and the obtained reference significantly assist downstream tasks.
翻译:在完成知识密集型任务时,人类有时不仅需要答案,还需要相应的参考段落进行辅助阅读。以往的方法需要通过额外的检索模型获取预分割的文章片段。本文探索利用大型语言模型(LLMs)在预训练阶段存储的参数化知识,独立地从任意起始位置回忆参考段落。我们提出了一个两阶段框架,模拟人类回忆易遗忘参考文献的场景。首先,通过提示LLM回忆文档标题标识符以获得粗粒度的文档集合;随后,基于获得的粗粒度文档集合,回忆细粒度的段落。在两阶段回忆过程中,我们采用约束解码确保不会生成存储文档之外的内容。为提升速度,第二阶段仅回忆短前缀,随后通过定位其位置获取完整段落。在KILT知识敏感任务上的实验验证了LLMs能够在多种任务形式中独立回忆参考段落位置,且获得的参考文献显著辅助了下游任务。