The remarkable instruction-following capability of large language models (LLMs) has sparked a growing interest in automatically learning suitable prompts. However, while many effective methods have been proposed, the cost incurred during the learning process (e.g., accessing LLM and evaluating the responses) has not been considered. To overcome this limitation, this work explicitly incorporates a finite budget constraint into prompt learning. Towards developing principled solutions, a novel connection is established between prompt learning and fixed-budget best arm identification (BAI-FB) in multi-armed bandits (MAB). Based on this connection, a general framework TRIPLE (besT aRm Identification for Prompt LEarning) is proposed to harness the power of BAI-FB in prompt learning systematically. Unique characteristics of prompt learning further lead to two embedding-based enhancements of TRIPLE by exploiting the ideas of clustering and function approximation. Extensive experiments on multiple well-adopted tasks using both GPT 3.5 and Llama2 demonstrate the significant performance improvement of TRIPLE over the previous baselines while satisfying the limited budget constraints.
翻译:大语言模型卓越的指令遵循能力引发了自动学习合适提示的日益兴趣。然而,尽管已提出许多有效方法,但学习过程中产生的成本(例如访问大语言模型和评估响应)尚未被考虑。为克服这一局限,本工作明确将有限预算约束纳入提示学习。为制定原则性解决方案,本文在提示学习与多臂老虎机中的固定预算最佳臂识别之间建立了新颖联系。基于此联系,提出了通用框架TRIPLE(提示学习的最佳臂识别),系统性地利用固定预算最佳臂识别在提示学习中的能力。提示学习的独特特性进一步催生了基于聚类和函数逼近思想的两种TRIPLE嵌入增强方案。在使用GPT 3.5和Llama2的多个广泛采用任务上的大量实验表明,TRIPLE在满足有限预算约束的同时,相较于先前基线方法实现了显著的性能提升。