Automatic prompt optimization is an important approach to improving the performance of large language models (LLMs). Recent research demonstrates the potential of using LLMs as prompt optimizers, which can generate improved task prompts via iterative refinement. In this paper, we propose a novel perspective to investigate the design of LLM-based prompt optimizers, by drawing an analogy with gradient-based model optimizers. To connect these two approaches, we identify two pivotal factors in model parameter learning: update direction and update method. Focused on the two aspects, we borrow the theoretical framework and learning methods from gradient-based optimization to design improved strategies for LLM-based prompt optimizers. By systematically analyzing a rich set of improvement strategies, we further develop a capable Gradient-inspired LLM-based Prompt Optimizer called GPO. At each step, it first retrieves relevant prompts from the optimization trajectory as the update direction. Then, it utilizes the generation-based refinement strategy to perform the update, while controlling the edit distance through a cosine-based decay strategy. Extensive experiments demonstrate the effectiveness and efficiency of GPO. In particular, GPO brings an additional improvement of up to 56.8% on Big-Bench Hard and 55.3% on MMLU compared to baseline methods.
翻译:自动提示优化是提升大型语言模型(LLM)性能的重要途径。最新研究表明,利用LLM作为提示优化器具有潜力,可以通过迭代优化生成更优的任务提示。本文提出一种新颖视角,通过类比梯度优化器来研究基于LLM的提示优化器设计。为串联这两种方法,我们识别出模型参数学习中的两个关键因素:更新方向与更新方法。聚焦这两个方面,我们借鉴梯度优化的理论框架和学习方法,为基于LLM的提示优化器设计改进策略。通过系统分析丰富的改进策略,我们进一步开发了一种功能强大的梯度启发式LLM提示优化器GPO。该优化器每一步首先从优化轨迹中检索相关提示作为更新方向,随后利用生成式精炼策略执行更新,同时通过余弦衰减策略控制编辑距离。大量实验证明了GPO的有效性和效率。特别地,与基线方法相比,GPO在Big-Bench Hard任务上带来高达56.8%的额外提升,在MMLU任务上提升达55.3%。