Reasoning can significantly enhance the performance of Large Language Models. While recent studies have exploited behavior-related prompts adjustment to enhance reasoning, these designs remain largely intuitive and lack a systematic analysis of the underlying behavioral patterns. Motivated by this, we investigate how models' reasoning behaviors shape reasoning from the perspective of behavioral patterns. We observe that models exhibit adaptive distributions of reasoning behaviors when responding to specific types of questions, and that structurally injecting these patterns can substantially influence the quality of the models' reasoning processes and outcomes. Building on these findings, we propose two optimization methods that require no parameter updates: InjectCorrect and InjectRLOpt. InjectCorrect guides the model by imitating behavioral patterns derived from its own past correct answers. InjectRLOpt learns a value function from historical behavior-pattern data and, via our proposed Reliability-Aware Softmax Policy, generates behavioral injectant during inference to steer the reasoning process. Our experiments demonstrate that both methods can improve model performance across various reasoning tasks without requiring any modifications to model parameters, achieving gains of up to 5.34% and 8.67%, respectively.
翻译:推理能力能够显著提升大语言模型的性能。尽管近期研究通过调整与行为相关的提示来增强推理,但这些设计在很大程度上仍依赖直觉,缺乏对底层行为模式的系统性分析。受此启发,我们从行为模式的角度探究模型的推理行为如何塑造其推理过程。我们观察到,模型在回答特定类型问题时展现出适应性的推理行为分布,并且结构化地注入这些模式能够显著影响模型推理过程及结果的质量。基于这些发现,我们提出了两种无需参数更新的优化方法:InjectCorrect 与 InjectRLOpt。InjectCorrect 通过模仿模型自身历史正确答案所衍生的行为模式来引导模型。InjectRLOpt 则从历史行为模式数据中学习一个价值函数,并通过我们提出的可靠性感知软最大值策略,在推理过程中生成行为注入剂以引导推理流程。实验表明,两种方法均能在不修改模型参数的情况下,提升模型在多种推理任务上的性能,分别取得了最高 5.34% 和 8.67% 的性能增益。