Retrieving documents and prepending them in-context at inference time improves performance of language model (LMs) on a wide range of tasks. However, these documents, often spanning hundreds of words, make inference substantially more expensive. We propose compressing the retrieved documents into textual summaries prior to in-context integration. This not only reduces the computational costs but also relieves the burden of LMs to identify relevant information in long retrieved documents. We present two compressors -- an extractive compressor which selects useful sentences from retrieved documents and an abstractive compressor which generates summaries by synthesizing information from multiple documents. Both compressors are trained to improve LMs' performance on end tasks when the generated summaries are prepended to the LMs' input, while keeping the summary concise.If the retrieved documents are irrelevant to the input or offer no additional information to LM, our compressor can return an empty string, implementing selective augmentation.We evaluate our approach on language modeling task and open domain question answering task. We achieve a compression rate of as low as 6% with minimal loss in performance for both tasks, significantly outperforming the off-the-shelf summarization models. We show that our compressors trained for one LM can transfer to other LMs on the language modeling task and provide summaries largely faithful to the retrieved documents.
翻译:在推理时检索文档并将其作为上下文前置可提升语言模型(LM)在广泛任务中的性能。然而,这些通常包含数百词的文档会显著增加推理成本。我们提出在上下文整合之前,将检索到的文档压缩为文本摘要。这不仅降低了计算成本,还减轻了LM在长篇幅检索文档中识别相关信息的负担。我们提出两种压缩器:一种是从检索文档中选取有效句子的抽取式压缩器,另一种是通过综合多篇文档信息生成摘要的生成式压缩器。两种压缩器均经过训练,旨在当生成的摘要被添加到LM输入之前时,能在保持摘要简洁性的同时提升LM在终端任务上的表现。若检索文档与输入无关或未提供额外信息,我们的压缩器可返回空字符串以实现选择性增强。我们在语言建模任务和开放域问答任务上评估了该方法。在两项任务中,我们实现了低至6%的压缩率且性能损失极小,显著优于现成的摘要模型。我们证明,为某一LM训练的压缩器可迁移至语言建模任务中的其他LM,且生成的摘要与检索文档高度保真。