In this work, we examine how targeted perturbations in the activation space of Language Models (LMs) can encode complex reasoning patterns. We inject steering vectors, derived from LM activations, into LMs during inference time and study whether these vectors can induce Chain-of-Thought (CoT) reasoning in LMs without the need for natural language prompting. We demonstrate this approach on Llama3 8B Instruct and Mistral 7B v0.2 Instruct and show that activation-space interventions achieve competitive, if not superior, performance compared to traditional CoT prompting across multiple reasoning benchmarks, including GSM8k, MMLU, AGI Eval, and ARC AI2. These findings suggest that neural network activations can encode reasoning patterns, offering a new application of activation space manipulation as a tool for tuning model behavior.
翻译:在本研究中,我们探讨了语言模型激活空间中定向扰动如何编码复杂推理模式。我们在推理阶段向语言模型注入源自其激活的引导向量,并研究这些向量能否在不依赖自然语言提示的情况下诱导出思维链推理。我们在Llama3 8B Instruct和Mistral 7B v0.2 Instruct模型上验证了该方法,结果表明:在GSM8k、MMLU、AGI Eval和ARC AI2等多个推理基准测试中,激活空间干预取得了与传统思维链提示相当甚至更优的性能。这些发现表明神经网络激活能够编码推理模式,为激活空间操控作为调整模型行为的新工具提供了应用前景。