Compressed prompts aid instruction-tuned language models (LMs) in overcoming context window limitations and reducing computational costs. Existing methods, which primarily based on training embeddings, face various challenges associated with interpretability, the fixed number of embedding tokens, reusability across different LMs, and inapplicability when interacting with black-box APIs. This study proposes prompt compression with reinforcement learning (PCRL), which is a discrete prompt compression method that addresses these issues. The proposed PCRL method utilizes a computationally efficient policy network that edits prompts directly. The training approach employed in the proposed PCRLs can be applied flexibly to various types of LMs, including both decoder-only and encoder-decoder architecture and it can be trained without gradient access to the LMs or labeled data. The proposed PCRL achieves an average reduction of 24.6\% in terms of the token count across various instruction prompts while maintaining sufficient performance. In addition, we demonstrate that the learned policy can be transferred to larger LMs, and through a comprehensive analysis, we explore the token importance within the prompts.
翻译:压缩型提示有助于指令微调语言模型克服上下文窗口限制并降低计算成本。现有方法主要基于嵌入向量训练,面临诸多挑战:可解释性不足、嵌入令牌数量固定、跨不同语言模型的可复用性受限,以及在与黑盒API交互时无法适用。本研究提出基于强化学习的提示压缩方法(PCRL),这是一种解决上述问题的离散型提示压缩技术。该PCRL方法采用计算高效的策略网络直接编辑提示文本。所提出的PCRL训练方法可灵活应用于各类语言模型,包括仅解码器架构和编码器-解码器架构,且无需访问语言模型的梯度或标注数据即可完成训练。该PCRL方法在各类指令提示中实现平均24.6%的令牌数量缩减,同时保持足够的性能。此外,我们证明学习到的策略可迁移至更大规模的语言模型,并通过综合分析探索了提示中的令牌重要性。