In many real-world applications, users rely on natural language instructions to guide large language models (LLMs) across a wide range of tasks. These instructions are often complex, diverse, and subject to frequent change. However, LLMs do not always attend to these instructions reliably, and users lack simple mechanisms to emphasize their importance beyond modifying prompt wording or structure. To address this, we present an inference-time method that enables users to emphasize specific parts of their prompt by steering the model's attention toward them, aligning the model's perceived importance of different prompt tokens with user intent. Unlike prior approaches that are limited to static instructions, require significant offline profiling, or rely on fixed biases, we dynamically update the proportion of model attention given to the user-specified parts--ensuring improved instruction following without performance degradation. We demonstrate that our approach improves instruction following across a variety of tasks involving multiple instructions and generalizes across models of varying scales.
翻译:在许多实际应用中,用户依赖自然语言指令来指导大语言模型(LLMs)执行各种任务。这些指令通常复杂多样且频繁变更。然而,LLMs并不总能可靠地关注这些指令,而用户除了修改提示词措辞或结构外,缺乏简单的机制来强调其重要性。为解决这一问题,我们提出一种推理时方法,使用户能够通过引导模型注意力聚焦于特定提示部分来强调其重要性,从而使模型对提示词元重要性的感知与用户意图保持一致。与以往局限于静态指令、需要大量离线分析或依赖固定偏置的方法不同,我们动态调整模型分配给用户指定部分的注意力比例——在确保性能不下降的前提下提升指令跟随能力。实验表明,该方法在涉及多重指令的多种任务中均能提升指令跟随效果,并适用于不同规模的模型。