While Large Reasoning Models (LRMs) have demonstrated impressive capabilities in solving complex tasks through the generation of long reasoning chains, this reliance on verbose generation results in significant latency and computational overhead. To address these challenges, we propose \textbf{CoSMo} (\textbf{Co}nsistency-Guided \textbf{S}plit-\textbf{M}erge \textbf{O}ptimization), a framework designed to eliminate structural redundancy rather than indiscriminately restricting token volume. Specifically, CoSMo utilizes a split-merge algorithm that dynamically refines reasoning chains by merging redundant segments and splitting logical gaps to ensure coherence. We then employ structure-aligned reinforcement learning with a novel segment-level budget to supervise the model in maintaining efficient reasoning structures throughout training. Extensive experiments across multiple benchmarks and backbones demonstrate that CoSMo achieves superior performance, improving accuracy by \textbf{3.3} points while reducing segment usage by \textbf{28.7\%} on average compared to reasoning efficiency baselines.
翻译:尽管大型推理模型(LRMs)通过生成长推理链在解决复杂任务方面展现出令人印象深刻的能力,但这种依赖于冗长生成本身会导致显著的延迟和计算开销。为应对这些挑战,我们提出\textbf{CoSMo}(\textbf{Co}nsistency-Guided \textbf{S}plit-\textbf{M}erge \textbf{O}ptimization,一致性引导的分裂-合并优化),这是一个旨在消除结构冗余而非不加区分地限制词汇量的框架。具体而言,CoSMo采用分裂-合并算法,通过合并冗余段和分裂逻辑间隙来动态优化推理链,以确保连贯性。随后,我们利用一种新颖的段级预算进行结构对齐的强化学习,以监督模型在整个训练过程中维持高效的推理结构。在多个基准测试和骨干网络上的大量实验表明,与推理效率基线相比,CoSMo实现了卓越的性能,平均准确率提升\textbf{3.3}个点,同时段使用量减少\textbf{28.7\%}。