Activation steering has emerged as a promising approach for efficiently adapting large language models (LLMs) to downstream behaviors. However, most existing steering methods rely on a single static direction per task or concept, making them inflexible under task variation and inadequate for complex tasks that require multiple coordinated capabilities. To address this limitation, we propose STEER2ADAPT, a lightweight framework that adapts LLMs by composing steering vectors rather than learning new ones from scratch. In many domains (e.g., reasoning or safety), tasks share a small set of underlying concept dimensions. STEER2ADAPT captures these dimensions as a reusable, low-dimensional semantic prior subspace, and adapts to new tasks by dynamically discovering a linear combination of basis vectors from only a handful of examples. Experiments across 9 tasks and 3 models in both reasoning and safety domains demonstrate the effectiveness of STEER2ADAPT, achieving an average improvement of 8.2%. Extensive analyses further show that STEER2ADAPT is a data-efficient, stable, and transparent inference-time adaptation method for LLMs.
翻译:激活导向已成为一种高效适配大语言模型至下游行为的有效方法。然而,现有导向方法大多依赖每个任务或概念的单一静态方向,导致其在任务变化时缺乏灵活性,且难以应对需要多维度协调能力的复杂任务。为解决这一局限,本文提出STEER2ADAPT——一种通过组合导向向量而非从头学习新向量来实现大语言模型适配的轻量级框架。在许多领域(如推理或安全性)中,不同任务共享少量潜在概念维度。STEER2ADAPT将这些维度捕获为可复用的低维语义先验子空间,并仅通过少量示例动态发现基向量的线性组合来适应新任务。在推理与安全领域的9项任务和3个模型上的实验验证了STEER2ADAPT的有效性,平均性能提升达8.2%。深入分析进一步表明,STEER2ADAPT是一种数据高效、稳定且透明的大语言模型推理时适配方法。