Developers using LLMs and LLM-based agents in their applications have provided plenty of anecdotal evidence that in-context-learning (ICL) is fragile. In this paper, we show that in addition to the quantity and quality of examples, the order in which the in-context examples are listed in the prompt affects the output of the LLM and, consequently, their performance. While prior work has explored improving ICL through dataset-dependent techniques, we introduce OptiSeq, a purely inference-time, dataset-free optimization method that efficiently determines the best example order. OptiSeq leverages log probabilities of LLM-generated outputs to systematically prune the search space of possible orderings and recommend the best order(s) by distinguishing orderings that yield high levels of accuracy and those that underperform. Extensive empirical evaluation on multiple LLMs, datasets, and prompts demonstrate that OptiSeq improves accuracy by 5.5 - 10.5 percentage points across multiple tasks.
翻译:开发者在应用中使用大语言模型(LLM)及基于LLM的智能体时,提供了大量轶事证据表明上下文学习(ICL)具有脆弱性。本文指出,除了示例的数量与质量外,提示中上下文示例的排列顺序同样会影响LLM的输出,进而影响其性能。尽管已有研究通过依赖数据集的技术改进ICL,我们提出了OptiSeq——一种纯推理阶段、无需数据集的优化方法,能高效确定最优示例顺序。OptiSeq利用LLM生成输出的对数概率,系统性地剪枝可能排序的搜索空间,并通过区分能产生高准确率的排序与表现不佳的排序来推荐最优顺序。在多种LLM、数据集及提示模板上的广泛实验评估表明,OptiSeq在多项任务中将准确率提升了5.5至10.5个百分点。