Prompt compression is crucial for enhancing inference speed, reducing costs, and improving user experience. However, current methods face challenges such as low compression ratios and potential data leakage during evaluation. To address these issues, we propose 500xCompressor, a method that compresses extensive natural language contexts into a minimum of one single special token. The 500xCompressor introduces approximately 0.3% additional parameters and achieves compression ratios ranging from 6x to 480x. It is designed to compress any text, answer various types of questions, and could be utilized by the original large language model (LLM) without requiring fine-tuning. Initially, 500xCompressor was pretrained on the Arxiv Corpus, followed by fine-tuning on the ArxivQA dataset, and subsequently evaluated on strictly unseen and classical question answering (QA) datasets. The results demonstrate that the LLM retained 62.26-72.89% of its capabilities compared to using non-compressed prompts. This study also shows that not all the compressed tokens are equally utilized and that K V values have significant advantages over embeddings in preserving information at high compression ratios. The highly compressive nature of natural language prompts, even for fine-grained complex information, suggests promising potential for future applications and further research into developing a new LLM language.
翻译:提示压缩对于提升推理速度、降低成本和改善用户体验至关重要。然而,现有方法面临压缩率低、评估过程中可能存在数据泄露等挑战。为解决这些问题,我们提出了500xCompressor,该方法可将大量自然语言上下文压缩至最少一个特殊标记。500xCompressor仅引入约0.3%的额外参数,即可实现6倍至480倍的压缩比。该方法设计用于压缩任意文本、回答各类问题,且无需微调即可被原始大语言模型(LLM)直接使用。500xCompressor首先在Arxiv语料库上进行预训练,随后在ArxivQA数据集上进行微调,并在严格未见过的经典问答(QA)数据集上进行评估。结果表明,与使用未压缩提示相比,LLM保留了62.26%至72.89%的能力。本研究还发现,并非所有压缩后的标记均被同等程度地利用,且在高压缩比下,键值(K V)在信息保留方面相比嵌入表示具有显著优势。自然语言提示所表现出的高度可压缩性——即使对于细粒度复杂信息亦然——预示着未来应用的广阔潜力,并为开发新型LLM语言提供了进一步的研究方向。