Despite the strong reasoning capabilities of recent large language models (LLMs), achieving reliable performance on challenging tasks often requires post-training or computationally expensive sampling strategies, limiting their practical efficiency. In this work, we first show that a small subset of neurons in LLMs exhibits strong predictive correlations with reasoning correctness. Based on this observation, we propose AdaRAS (Adaptive Reasoning Activation Steering), a lightweight test-time framework that improves reasoning reliability by selectively intervening on neuron activations. AdaRAS identifies Reasoning-Critical Neurons (RCNs) via a polarity-aware mean-difference criterion and adaptively steers their activations during inference, enhancing incorrect reasoning traces while avoiding degradation on already-correct cases. Experiments on 10 mathematics and coding benchmarks demonstrate consistent improvements, including over 13% gains on AIME-24 and AIME-25. Moreover, AdaRAS exhibits strong transferability across datasets and scalability to stronger models, outperforming post-training methods without additional training or sampling cost.
翻译:尽管近期大语言模型展现出强大的推理能力,但在复杂任务上实现可靠性能通常需要后训练或计算代价高昂的采样策略,这限制了其实际应用效率。本研究首先证明大语言模型中存在一小部分神经元,其激活与推理正确性具有强预测相关性。基于此发现,我们提出AdaRAS(自适应推理激活引导)——一种轻量级测试时框架,通过选择性干预神经元激活来提升推理可靠性。AdaRAS通过极性感知的均值差异准则识别推理关键神经元,并在推理过程中自适应引导其激活,从而增强错误推理路径,同时避免对已正确案例产生性能衰减。在10个数学与代码基准测试上的实验表明,该方法能带来一致的性能提升,包括在AIME-24和AIME-25基准上获得超过13%的增益。此外,AdaRAS展现出优异的跨数据集可迁移性及向更强模型的扩展能力,其性能超越后训练方法,且无需额外训练或采样成本。