Instruction fine-tuning has conventionally been employed to adapt Large Language Models (LLMs) to a variety of tasks. Nonetheless, this technique often necessitates substantial computational resources, making it impractical for deployment by individuals or small-scale entities. Recently, Low-Rank Adaptation (LoRA) has become a promising alternative, offering high capabilities on par with full tuning with reduced resource overhead. However, attaining satisfactory performance through the fine-tuning of LoRA is a non-trivial challenge. In this paper, we propose PILLOW, which aims to improve LoRA's performance by a discrimination-based prompting method, leveraging LLMs' In-Context Learning ability. PILLOW incorporates a matching network that selects prompts from a user-defined prompt pool, concatenates the selected prompts with the user instruction as input, and performs inference using the LoRA-fine-tuned LLMs. Trained with Reinforcement Learning, PILLOW exhibits commensurate performance on various evaluation metrics compared with typical instruction fine-tuning methods, utilizing only consumer-grade GPU resources and exhibiting a large reduction in computational costs.
翻译:指令微调传统上被用于使大型语言模型(LLMs)适应多种任务。然而,这一技术通常需要大量的计算资源,使得个人或小型实体部署变得不切实际。最近,低秩适应(LoRA)成为一种有前景的替代方案,能够在降低资源开销的同时提供与全微调相媲美的高性能。然而,通过微调LoRA获得令人满意的性能是一项具有挑战性的任务。在本文中,我们提出了PILLOW,旨在通过一种基于判别能力的提示方法改进LoRA的性能,利用LLMs的上下文学习能力。PILLOW包含一个匹配网络,该网络从用户定义的提示池中选择提示,将选定的提示与用户指令拼接作为输入,并使用LoRA微调后的LLMs进行推理。经过强化学习训练,PILLOW在各种评估指标上表现出与典型指令微调方法相当的性能,同时仅使用消费级GPU资源,并大幅降低了计算成本。