While Large Reasoning Models (LRMs) have demonstrated impressive capabilities in solving complex tasks through the generation of long reasoning chains, this reliance on verbose generation results in significant latency and computational overhead. To address these challenges, we propose \textbf{CoSMo} (\textbf{Co}nsistency-Guided \textbf{S}plit-\textbf{M}erge \textbf{O}ptimization), a framework designed to eliminate structural redundancy rather than indiscriminately restricting token volume. Specifically, CoSMo utilizes a split-merge algorithm that dynamically refines reasoning chains by merging redundant segments and splitting logical gaps to ensure coherence. We then employ structure-aligned reinforcement learning with a novel segment-level budget to supervise the model in maintaining efficient reasoning structures throughout training. Extensive experiments across multiple benchmarks and backbones demonstrate that CoSMo achieves superior performance, improving accuracy by \textbf{3.3} points while reducing segment usage by \textbf{28.7\%} on average compared to reasoning efficiency baselines.
翻译:尽管大型推理模型(LRMs)通过生成长推理链在解决复杂任务方面展现出令人印象深刻的能力,但这种对冗长生成的依赖导致了显著的延迟和计算开销。为应对这些挑战,我们提出 **CoSMo**(**Co**nsistency-Guided **S**plit-**M**erge **O**ptimization,一致性引导的分割-合并优化框架),该框架旨在消除结构冗余而非不加区分地限制令牌数量。具体而言,CoSMo 采用一种分割-合并算法,通过合并冗余段和分割逻辑间隙来动态优化推理链,以确保连贯性。随后,我们采用结构对齐的强化学习配合新颖的段级预算,在训练过程中监督模型维持高效的推理结构。在多个基准测试和骨干模型上进行的大量实验表明,与推理效率基线相比,CoSMo 实现了更优的性能,平均准确率提升 **3.3** 个百分点,同时段使用量减少 **28.7%**。