Despite the empirical success of extensive, step-by-step reasoning in large multimodal models, long reasoning processes inevitably incur substantial computational overhead, i.e., in terms of higher token costs and increased response time, which undermines inference efficiency. In contrast, humans often employ sketch-style reasoning: a concise, goal-directed cognitive process that prioritizes salient information and enables efficient problem-solving. Inspired by this cognitive efficiency, we propose SketchThinker-R1, which incentivizes sketch-style reasoning ability in large multimodal models. Our method consists of three primary stages. In the Sketch-Mode Cold Start stage, we convert standard long reasoning process into sketch-style reasoning and finetune base multimodal model, instilling initial sketch-style reasoning capability. Next, we train SketchJudge Reward Model, which explicitly evaluates thinking process of model and assigns higher scores to sketch-style reasoning. Finally, we conduct Sketch-Thinking Reinforcement Learning under supervision of SketchJudge to further generalize sketch-style reasoning ability. Experimental evaluation on four benchmarks reveals that our SketchThinker-R1 achieves over 64% reduction in reasoning token cost without compromising final answer accuracy. Qualitative analysis further shows that sketch-style reasoning focuses more on key cues during problem solving.
翻译:尽管广泛、逐步的推理在大型多模态模型中取得了经验上的成功,但冗长的推理过程不可避免地带来巨大的计算开销,即更高的令牌成本和增加的响应时间,这损害了推理效率。相比之下,人类经常采用草图式推理:一种简洁的、目标导向的认知过程,它优先处理显著信息并实现高效的问题解决。受这种认知效率的启发,我们提出了SketchThinker-R1,旨在激励大型多模态模型中的草图式推理能力。我们的方法包含三个主要阶段。在草图模式冷启动阶段,我们将标准的长推理过程转换为草图式推理并对基础多模态模型进行微调,从而注入初始的草图式推理能力。接着,我们训练SketchJudge奖励模型,该模型显式评估模型的思维过程,并为草图式推理分配更高的分数。最后,我们在SketchJudge的监督下进行草图思维强化学习,以进一步泛化草图式推理能力。在四个基准测试上的实验评估表明,我们的SketchThinker-R1在不影响最终答案准确性的情况下,实现了超过64%的推理令牌成本降低。定性分析进一步表明,草图式推理在问题解决过程中更侧重于关键线索。