Automatic prompt optimization is an important approach to improving the performance of large language models (LLMs). Recent research demonstrates the potential of using LLMs as prompt optimizers, which can generate improved task prompts via iterative refinement. In this paper, we propose a novel perspective to investigate the design of LLM-based prompt optimizers, by drawing an analogy with gradient-based model optimizers. To connect these two approaches, we identify two pivotal factors in model parameter learning: update direction and update method. Focused on the two aspects, we borrow the theoretical framework and learning methods from gradient-based optimization to design improved strategies for LLM-based prompt optimizers. By systematically analyzing a rich set of improvement strategies, we further develop a capable Gradient-inspired LLM-based Prompt Optimizer called GPO. At each step, it first retrieves relevant prompts from the optimization trajectory as the update direction. Then, it utilizes the generation-based refinement strategy to perform the update, while controlling the edit distance through a cosine-based decay strategy. Extensive experiments demonstrate the effectiveness and efficiency of GPO. In particular, GPO brings an additional improvement of up to 56.8% on Big-Bench Hard and 55.3% on MMLU compared to baseline methods.
翻译:自动提示优化是提升大型语言模型性能的重要方法。近期研究表明,利用大型语言模型作为提示优化器,通过迭代优化生成改进任务提示具有潜力。本文提出一种新颖视角,通过类比梯度优化器研究基于大型语言模型的提示优化器设计。为连接这两种方法,我们识别出模型参数学习中的两个关键因素:更新方向与更新方法。聚焦这两个方面,我们借鉴梯度优化的理论框架与学习方法,设计针对基于大型语言模型的提示优化器的改进策略。通过系统分析丰富的改进策略,我们进一步开发了名为GPO的梯度启发型基于大型语言模型的提示优化器。在每一步中,它首先从优化轨迹中检索相关提示作为更新方向;然后利用基于生成的精炼策略执行更新,同时通过余弦衰减策略控制编辑距离。大量实验证明了GPO的有效性与高效性。特别地,与基线方法相比,GPO在Big-Bench Hard上带来高达56.8%的额外改进,在MMLU上带来55.3%的额外改进。