Large reasoning models (LRMs) achieve strong performance by producing long chains of thought, but their inference costs are high and often generate redundant reasoning. Small language models (SLMs) are far more efficient, yet struggle on multi-step reasoning tasks. A natural idea is to let a large model guide a small one at inference time as a mentor, yet existing collaboration methods often promote imitation, resulting in verbose reasoning without consistent error correction. We propose MentorCollab, an inference-time collaboration method in which an LRM selectively and sparsely guides an SLM, rather than taking over generation. At randomly sampled token positions, we probe for divergences between the two models and use a lightweight verifier to decide whether the SLM should follow a short lookahead segment from its mentor or continue on its own. Across 15 SLM--LRM pairs and 3 domains (math reasoning, general knowledge, and commonsense reasoning), our method improves performance in 12 settings, with average gains of 3.0% and up to 8.0%, while adopting only having 18.4% tokens generated by the expensive mentor model on average. We find that short segments and selective probing are sufficient for effective collaboration. Our results show that selective inference-time guidance restores large-model reasoning ability without substantial inference overhead.
翻译:大型推理模型(LRM)通过生成长链思维实现强大性能,但其推理成本高昂且常产生冗余推理。小型语言模型(SLM)效率显著更高,却在多步推理任务上表现欠佳。一个自然的思路是让大型模型在推理时作为导师引导小型模型,然而现有协作方法往往鼓励模仿,导致冗长推理且无法持续纠正错误。我们提出MentorCollab,一种推理时协作方法,其中LRM选择性地稀疏引导SLM,而非接管生成过程。在随机采样的标记位置,我们探测两个模型间的分歧,并利用轻量验证器决定SLM应遵循其导师的短前瞻片段还是自主继续生成。在15组SLM-LRM配对及3个领域(数学推理、通用知识和常识推理)的实验中,我们的方法在12种设定下提升了性能,平均增益达3.0%,最高达8.0%,同时平均仅需昂贵导师模型生成18.4%的标记。研究发现短片段与选择性探测足以实现有效协作。结果表明,选择性推理时引导能在不引入显著推理开销的前提下恢复大模型的推理能力。