Large language models (LLMs) demonstrate exceptional instruct-following ability to complete various downstream tasks. Although this impressive ability makes LLMs flexible task solvers, their performance in solving tasks also heavily relies on instructions. In this paper, we reveal that LLMs are over-sensitive to lexical variations in task instructions, even when the variations are imperceptible to humans. By providing models with neighborhood instructions, which are closely situated in the latent representation space and differ by only one semantically similar word, the performance on downstream tasks can be vastly different. Following this property, we propose a black-box Combinatorial Optimization framework for Prompt Lexical Enhancement (COPLE). COPLE performs iterative lexical optimization according to the feedback from a batch of proxy tasks, using a search strategy related to word influence. Experiments show that even widely-used human-crafted prompts for current benchmarks suffer from the lexical sensitivity of models, and COPLE recovers the declined model ability in both instruct-following and solving downstream tasks.
翻译:大语言模型(LLMs)展现出卓越的指令跟随能力,能够完成多种下游任务。尽管这种出色的能力使LLMs成为灵活的任务求解器,但其在解决任务时的表现也高度依赖于指令。本文揭示,LLMs对任务指令中的词汇变化过度敏感,即使这些变化对人类而言难以察觉。通过向模型提供在潜在表示空间中位置邻近、仅相差一个语义相似词的邻域指令,模型在下游任务上的表现可能产生巨大差异。基于此特性,我们提出一种黑盒式组合优化框架用于提示词汇增强(COPLE)。COPLE根据一组代理任务的反馈,采用与词汇影响力相关的搜索策略进行迭代式词汇优化。实验表明,即使当前基准测试中广泛使用的人工编写提示也受到模型词汇敏感性的影响,而COPLE能够恢复模型在指令跟随和解决下游任务中下降的能力。