Recent work on activation and latent steering has demonstrated that modifying internal representations can effectively guide large language models (LLMs) toward improved reasoning and efficiency without additional training. However, most existing approaches rely on fixed steering policies and static intervention strengths, which limit their robustness across problem instances and often result in over- or under-steering. We propose Adaptive Test-time Latent Steering, called (ATLAS), a task- specific framework that dynamically controls steering decisions at inference time using an external, lightweight latent verifier. Given intermediate hidden states, the verifier predicts the quality of ongoing reasoning and adaptively selects whether and how strongly to apply steering, enabling per-example and per-step adjustment with minimal overhead. To our knowledge, ATLAS is the first method to integrate learned latent verification into test-time steering for enhancing LLMs reasoning. Experiments on multiple mathematical reasoning benchmarks show that ATLAS consistently outperforms both vanilla decoding and fixed steering baselines, achieving higher accuracy while substantially reducing test-time token usage. These results demonstrate that verifier-guided latent adaptation provides an effective and scalable mechanism for controlling reasoning efficiency without sacrificing solution quality. All source code will be publicly available.
翻译:近期关于激活与潜在调控的研究表明,修改内部表征可以有效引导大语言模型(LLMs)在不进行额外训练的情况下提升推理能力与效率。然而,现有方法大多依赖固定的调控策略和静态的干预强度,这限制了其在各类问题实例上的鲁棒性,并常导致调控过度或不足。我们提出自适应测试时潜在调控方法(ATLAS),这是一个任务特定的框架,通过外部轻量级潜在验证器在推理阶段动态控制调控决策。给定中间隐藏状态,该验证器预测当前推理的质量,并自适应地选择是否进行调控以及调控强度,从而以最小开销实现逐样本、逐步骤的调整。据我们所知,ATLAS是首个将习得的潜在验证机制集成到测试时调控中以增强LLMs推理能力的方法。在多个数学推理基准测试上的实验表明,ATLAS在原始解码和固定调控基线方法上均取得了一致性优势,在显著降低测试时令牌使用量的同时实现了更高的准确率。这些结果证明,基于验证器引导的潜在自适应提供了一种有效且可扩展的机制,可在不牺牲求解质量的前提下控制推理效率。所有源代码将公开提供。