The growing popularity of Large Language Models has sparked interest in context compression for Large Language Models (LLMs). However, the performance of previous methods degrades dramatically as compression ratios increase, sometimes even falling to the closed-book level. This decline can be attributed to the loss of key information during the compression process. Our preliminary study supports this hypothesis, emphasizing the significance of retaining key information to maintain model performance under high compression ratios. As a result, we introduce Query-Guided Compressor (QGC), which leverages queries to guide the context compression process, effectively preserving key information within the compressed context. Additionally, we employ a dynamic compression strategy. We validate the effectiveness of our proposed QGC on the Question Answering task, including NaturalQuestions, TriviaQA, and HotpotQA datasets. Experimental results show that QGC can consistently perform well even at high compression ratios, which also offers significant benefits in terms of inference cost and throughput.
翻译:随着大语言模型的日益普及,针对大语言模型的上下文压缩技术引起了广泛关注。然而,现有方法在压缩比升高时性能急剧下降,有时甚至退化至闭卷测试水平。这种性能衰退可归因于压缩过程中关键信息的丢失。我们的初步研究支持这一假设,强调了在高压缩比下保留关键信息对维持模型性能的重要性。为此,我们提出了查询引导压缩器,该方法利用查询来指导上下文压缩过程,从而在压缩后的上下文中有效保留关键信息。此外,我们还采用了动态压缩策略。我们在问答任务上验证了所提方法的有效性,包括NaturalQuestions、TriviaQA和HotpotQA数据集。实验结果表明,即使在高压缩比下,查询引导压缩器仍能保持稳定性能,同时在推理成本和吞吐量方面也展现出显著优势。