Post-training adaptation of language models is commonly achieved through parameter updates or input-based methods such as fine-tuning, parameter-efficient adaptation, and prompting. In parallel, a growing body of work modifies internal activations at inference time to influence model behavior, an approach known as steering. Despite increasing use, steering is rarely analyzed within the same conceptual framework as established adaptation methods. In this work, we argue that steering should be regarded as a form of model adaptation. We introduce a set of functional criteria for adaptation methods and use them to compare steering approaches with classical alternatives. This analysis positions steering as a distinct adaptation paradigm based on targeted interventions in activation space, enabling local and reversible behavioral change without parameter updates. The resulting framing clarifies how steering relates to existing methods, motivating a unified taxonomy for model adaptation.
翻译:训练后语言模型的自适应通常通过参数更新或基于输入的方法实现,例如微调、参数高效自适应和提示工程。与此同时,越来越多的研究在推理时修改内部激活以影响模型行为,这种方法被称为引导。尽管引导的使用日益广泛,但它很少在已有的自适应方法概念框架内进行分析。本文提出,引导应被视为一种模型自适应形式。我们引入了一套自适应方法的功能性标准,并利用这些标准将引导方法与经典自适应方法进行比较。该分析将引导定位为一种基于激活空间定向干预的独特自适应范式,能够在无需参数更新的情况下实现局部且可逆的行为改变。由此形成的框架阐明了引导与现有方法的关系,为模型自适应建立统一分类提供了依据。