Despite the recent success of Large Language Models (LLMs), it remains challenging to feed LLMs with long prompts due to the fixed size of LLM inputs. As a remedy, prompt compression becomes a promising solution by removing redundant tokens in the prompt. However, using LLM in the existing works requires additional computation resources and leads to memory overheads. To address it, we propose ICPC (In-context Prompt Compression), a novel and scalable prompt compression method that adaptively reduces the prompt length. The key idea of ICPC is to calculate the probability of each word appearing in the prompt using encoders and calculate information carried by each word through the information function, which effectively reduces the information loss during prompt compression and increases the speed of compression. Empirically, we demonstrate that ICPC can effectively compress long texts of different categories and thus achieve better performance and speed on different types of NLP tasks.
翻译:尽管大型语言模型(LLMs)近期取得了显著成功,但由于其输入长度固定,向LLM输入长提示文本仍面临挑战。作为解决方案,提示压缩通过去除提示中的冗余标记成为一种前景广阔的技术。然而,现有方法使用LLM进行压缩需要额外的计算资源并导致内存开销。为此,我们提出ICPC(基于上下文的提示压缩),这是一种新颖且可扩展的自适应提示压缩方法。ICPC的核心思想是利用编码器计算提示中每个词出现的概率,并通过信息函数量化每个词所携带的信息量,从而有效降低压缩过程中的信息损失并提升压缩速度。实验表明,ICPC能够有效压缩不同类别的长文本,进而在多种自然语言处理任务上实现更优的性能与更快的处理速度。