Large Language Models (LLMs) demonstrate exceptional capabilities in various scenarios. However, they suffer from much redundant information and are sensitive to the position of key information (relevant to the input question) in long context scenarios, leading to inferior performance. To address these challenges, we present Perception Compressor, a training-free prompt compression method. It includes a perception retriever that leverages guiding questions and instruction to retrieve the most relevant demonstrations, a dual-slope ratio allocator to dynamically allocate compression ratios and open-book ratios, and a semi-guided iterative compression that retains key information at the token level while removing tokens that distract the LLM. We conduct extensive experiments on long context benchmarks, i.e., NaturalQuestions, LongBench, and MuSiQue. Experiment results show that Perception Compressor outperforms existing methods by a large margin, achieving state-of-the-art performance.
翻译:大型语言模型(LLM)在各种场景中展现出卓越的能力。然而,在长上下文场景中,它们面临大量冗余信息且对关键信息(与输入问题相关)的位置敏感,导致性能下降。为应对这些挑战,我们提出了感知压缩器,一种免训练的提示压缩方法。该方法包含一个感知检索器,利用引导问题和指令检索最相关的示例;一个双斜率比率分配器,动态分配压缩比率和开卷比率;以及一个半引导迭代压缩模块,在令牌级别保留关键信息的同时移除分散LLM注意力的令牌。我们在长上下文基准测试(即NaturalQuestions、LongBench和MuSiQue)上进行了大量实验。实验结果表明,感知压缩器显著优于现有方法,达到了最先进的性能水平。