Prompt optimization is essential for enhancing the performance of Large Language Models (LLMs) in a range of Natural Language Processing (NLP) tasks, particularly in scenarios of few-shot learning where training examples are incorporated directly into the prompt. Despite the growing interest in optimizing prompts with few-shot examples, existing methods for prompt optimization are often resource-intensive or perform inadequately. In this work, we propose PrOmpting with Episodic Memory (POEM), a novel prompt optimization technique that is simple, efficient, and demonstrates strong generalization capabilities. We approach prompt optimization as a Reinforcement Learning (RL) challenge, using episodic memory to archive combinations of input data, permutations of few-shot examples, and the rewards observed during training. In the testing phase, we optimize the sequence of examples for each test query by selecting the sequence that yields the highest total rewards from the top-k most similar training examples in the episodic memory. Our results show that POEM outperforms recent techniques like TEMPERA and RLPrompt by over 5.3% in various text classification tasks. Furthermore, our approach adapts well to broader language understanding tasks, consistently outperforming conventional heuristic methods for ordering examples.
翻译:提示优化对于提升大语言模型在自然语言处理任务中的性能至关重要,尤其在少样本学习场景中——训练样本被直接嵌入提示模板。尽管利用少样本示例优化提示的方法日益受到关注,现有技术往往计算资源消耗大或性能欠佳。本研究提出基于情景记忆的提示优化方法,这是一种新颖的提示优化技术,其设计简洁高效,且展现出强大的泛化能力。我们将提示优化构建为强化学习问题,利用情景记忆库归档训练过程中的输入数据组合、少样本示例排列及观测到的奖励信号。在测试阶段,通过从情景记忆库中检索与测试查询最相似的top-k训练样本,选取能产生最高累计奖励的示例序列进行优化。实验结果表明,在多项文本分类任务中,POEM的性能显著优于TEMPERA和RLPrompt等最新技术,提升幅度超过5.3%。此外,该方法能良好适应更广泛的语言理解任务,其表现持续优于传统的启发式示例排序方法。