Teach Better or Show Smarter? On Instructions and Exemplars in Automatic Prompt Optimization

Large language models have demonstrated remarkable capabilities, but their performance is heavily reliant on effective prompt engineering. Automatic prompt optimization (APO) methods are designed to automate this and can be broadly categorized into those targeting instructions (instruction optimization, IO) vs. those targeting exemplars (exemplar optimization, EO). Despite their shared objective, these have evolved rather independently, with IO receiving more research attention recently. This paper seeks to bridge this gap by comprehensively comparing the performance of representative IO and EO techniques both isolation and combination on a diverse set of challenging tasks. Our findings reveal that intelligently reusing model-generated input-output pairs obtained from evaluating prompts on the validation set as exemplars, consistently improves performance on top of IO methods but is currently under-investigated. We also find that despite the recent focus on IO, how we select exemplars can outweigh how we optimize instructions, with EO strategies as simple as random search outperforming state-of-the-art IO methods with seed instructions without any optimization. Moreover, we observe a synergy between EO and IO, with optimal combinations surpassing the individual contributions. We conclude that studying exemplar optimization both as a standalone method and its optimal combination with instruction optimization remain a crucial aspect of APO and deserve greater consideration in future research, even in the era of highly capable instruction-following models.

翻译：大型语言模型已展现出卓越的能力，但其性能在很大程度上依赖于有效的提示工程。自动提示优化（APO）方法旨在自动化这一过程，可大致分为针对指令的优化（指令优化，IO）与针对示例的优化（示例优化，EO）。尽管目标相同，这两类方法的发展却相对独立，其中IO近年来获得了更多研究关注。本文旨在弥合这一差距，通过在多样化挑战性任务上全面比较代表性IO与EO技术（包括单独使用及组合使用）的性能。我们的研究发现，智能地复用模型在验证集上评估提示时生成的输入-输出对作为示例，能够在IO方法的基础上持续提升性能，但这一策略目前尚未得到充分研究。我们还发现，尽管近期研究聚焦于IO，但示例选择方式的影响可能超过指令优化方式——即使采用如随机搜索这般简单的EO策略，也能在使用未经优化的初始指令时超越最先进的IO方法。此外，我们观察到EO与IO之间存在协同效应，二者的最优组合能够超越各自独立贡献的总和。我们的结论是：将示例优化作为独立方法进行研究，并探索其与指令优化的最优组合，仍是APO领域的关键课题，即使在高度遵循指令的模型时代，这一方向也值得未来研究给予更多重视。