As retrieval-augmented generation (RAG) tackles complex tasks, increasingly expanded contexts offer richer information, but at the cost of higher latency and increased cognitive load on the model. To mitigate this bottleneck, especially for intricate multi-hop questions, we introduce BRIEF-Pro. It is a universal, lightweight compressor that distills relevant evidence for a given query from retrieved documents into a concise summary for seamless integration into in-context RAG. Using seed data consisting of relatively short contexts (fewer than 1k words), BRIEF-Pro is trained to perform abstractive compression of extended contexts exceeding 10k words across a wide range of scenarios. Furthermore, BRIEF-Pro offers flexible user control over summary length by allowing users to specify the desired number of sentences. Experiments on four open-domain multi-hop question-answering datasets show that BRIEF-Pro generates more concise and relevant summaries, enhancing performance across small, large, and proprietary language models. With the 70B reader model, 32x compression by BRIEF-Pro improves QA performance by 4.67% on average over LongLLMLingua's 9x, while requiring only 23% of its computational overhead.
翻译:随着检索增强生成(RAG)处理复杂任务,不断扩展的上下文提供了更丰富的信息,但同时也带来了更高的延迟和模型认知负荷的增加。为了缓解这一瓶颈,特别是针对复杂的多跳问题,我们提出了BRIEF-Pro。它是一种通用、轻量级的压缩器,能够从检索到的文档中为给定查询提炼相关证据,生成简洁的摘要,以便无缝集成到上下文RAG中。利用由相对较短上下文(少于1000词)组成的种子数据,BRIEF-Pro经过训练,能够在广泛场景下对超过10000词的扩展上下文进行抽象压缩。此外,BRIEF-Pro允许用户指定所需的句子数量,从而灵活控制摘要长度。在四个开放域多跳问答数据集上的实验表明,BRIEF-Pro生成的摘要更简洁、更相关,提升了小型、大型及专有语言模型的性能。使用700亿参数的阅读器模型时,BRIEF-Pro的32倍压缩相比LongLLMLingua的9倍压缩,平均将问答性能提升了4.67%,而计算开销仅为其23%。