While Large Reasoning Models (LRMs) have demonstrated impressive capabilities in solving complex tasks through the generation of long reasoning chains, this reliance on verbose generation results in significant latency and computational overhead. To address these challenges, we propose \textbf{CoSMo} (\textbf{Co}nsistency-Guided \textbf{S}plit-\textbf{M}erge \textbf{O}ptimization), a framework designed to eliminate structural redundancy rather than indiscriminately restricting token volume. Specifically, CoSMo utilizes a split-merge algorithm that dynamically refines reasoning chains by merging redundant segments and splitting logical gaps to ensure coherence. We then employ structure-aligned reinforcement learning with a novel segment-level budget to supervise the model in maintaining efficient reasoning structures throughout training. Extensive experiments across multiple benchmarks and backbones demonstrate that CoSMo achieves superior performance, improving accuracy by \textbf{3.3} points while reducing segment usage by \textbf{28.7\%} on average compared to reasoning efficiency baselines.
翻译:尽管大型推理模型通过生成长推理链在解决复杂任务方面展现出令人瞩目的能力,但这种依赖冗长生成本身却导致显著的延迟和计算开销。为应对这些挑战,我们提出\textbf{CoSMo}(\textbf{Co}nsistency-Guided \textbf{S}plit-\textbf{M}erge \textbf{O}ptimization,一致性引导分裂-合并优化)框架,旨在消除结构冗余而非无差别地限制令牌数量。具体而言,CoSMo采用分裂-合并算法,通过合并冗余段和分裂逻辑缺口来动态优化推理链,确保连贯性。随后,我们采用结构对齐的强化学习,结合新颖的段级预算机制,在整个训练过程中监督模型维持高效推理结构。在多个基准测试和骨干网络上的大量实验表明,CoSMo实现了卓越性能:与推理效率基线相比,平均准确率提升\textbf{3.3}个点,同时段使用量减少\textbf{28.7\%}。