Large language models (LLMs) require lengthy prompts as the input context to produce output aligned with user intentions, a process that incurs extra costs during inference. In this paper, we propose the Gist COnditioned deCOding (Gist-COCO) model, introducing a novel method for compressing prompts which also can assist the prompt interpretation and engineering. Gist-COCO employs an encoder-decoder based language model and then incorporates an additional encoder as a plugin module to compress prompts with inputs using gist tokens. It finetunes the compression plugin module and uses the representations of gist tokens to emulate the raw prompts in the vanilla language model. By verbalizing the representations of gist tokens into gist prompts, the compression ability of Gist-COCO can be generalized to different LLMs with high compression rates. Our experiments demonstrate that Gist-COCO outperforms previous prompt compression models in both passage and instruction compression tasks. Further analysis on gist verbalization results suggests that our gist prompts serve different functions in aiding language models. They may directly provide potential answers, generate the chain-of-thought, or simply repeat the inputs. All data and codes are available at https://github.com/OpenMatch/Gist-COCO .
翻译:大语言模型(LLM)需要冗长的提示作为输入上下文以生成符合用户意图的输出,这一过程在推理时会产生额外成本。本文提出Gist条件解码(Gist-COCO)模型,引入了一种新颖的提示压缩方法,同时能辅助提示解释和工程。Gist-COCO采用基于编码器-解码器的语言模型,并额外集成一个编码器作为插件模块,利用要点令牌压缩包含输入的提示。该模型微调压缩插件模块,利用要点令牌的表征模拟原始语言模型中的原始提示。通过将要点令牌的表征言化为要点提示,Gist-COCO的压缩能力可泛化至不同大语言模型,且保持高压缩率。实验表明,Gist-COCO在段落压缩和指令压缩任务中均优于现有提示压缩模型。对要点言化结果的进一步分析提示,我们的要点提示在辅助语言模型时发挥不同功能:它们可能直接提供潜在答案、生成思维链,或仅重复输入。所有数据和代码已开源至https://github.com/OpenMatch/Gist-COCO。