Instruction-tuned Language Models (LMs) are widely used by users to address various problems with task-specific prompts. Constraints associated with the context window length and computational costs encourage the development of compressed prompts. Existing methods rely heavily on training embeddings, which are designed to accommodate multiple token meanings. This presents challenges in terms of interpretability, a fixed number of embedding tokens, reusability across different LMs, and inapplicability when interacting with black-box APIs. This study proposes prompt compression with reinforcement learning (PCRL), a novel discrete prompt compression method that addresses these issues. PCRL employs a computationally efficient policy network that directly edits prompts. The PCRL training approach can be flexibly applied to various types of LMs, as well as decoder-only and encoder-decoder architecture, and can be trained without gradient access to LMs or labeled data. PCRL achieves an average reduction of 24.6% in token count across various instruction prompts while preserving performance. Further, we demonstrate that the learned policy can be transferred to larger LMs, and through various analyses, we aid the understanding of token importance within prompts.
翻译:指令微调语言模型(LMs)被广泛用于解决各种带有任务特定提示的问题。上下文窗口长度的限制和计算成本促使了压缩提示的发展。现有方法严重依赖训练嵌入,这些嵌入旨在容纳多个标记含义,但在可解释性、固定数量的嵌入标记、跨不同LMs的可复用性以及无法与黑盒API交互等方面存在挑战。本研究提出了基于强化学习的提示压缩(PCRL),这是一种新颖的离散提示压缩方法,可解决上述问题。PCRL采用计算高效策略网络直接编辑提示。PCRL训练方法可灵活应用于多种类型的LMs,包括仅解码器和编码器-解码器架构,且无需访问LMs的梯度或标注数据即可进行训练。在各种指令提示中,PCRL在保持性能的同时实现了平均24.6%的标记数量减少。此外,我们证明了学习到的策略可迁移至更大规模的LMs,并通过多种分析帮助理解提示中的标记重要性。