Large language models achieve breakthroughs in complex reasoning via long chain-of-thought sequences. However, this often leads to severe reasoning inflation, causing substantial computational redundancy. To maximize Intelligence per Token, we introduce a theoretical metric, MSL-Minimal Sufficient Length. MSL rigorously characterizes the shortest reasoning length that preserves answer correctness. We provide a recursive definition based on independently sampled sequences and prove the existence of its limit, establishing the first measurable lower bound for reasoning-chain compression. Building on an analysis of mainstream CoT compression strategies, we identify key structural factors enabling a model to approach MSL. Based on these insights, we propose TRiMS which employs the GRPO algorithm in conjunction with MSL-based estimation during training, while mitigating instabilities during the training process through dynamic batch aggregation and advantage computation using batch-level standard deviation. TRiMS achieves over 80% CoT token reduction with a minor accuracy boost across all benchmarks.
翻译:大型语言模型通过长链思维序列在复杂推理任务中取得突破。然而,这通常会导致严重的推理膨胀,造成大量计算冗余。为最大化单位令牌的智能效率,我们引入了一个理论度量标准——MSL(最小充分长度)。MSL严格刻画了在保持答案正确性前提下的最短推理长度。我们基于独立采样的序列给出了递归定义,并证明了其极限的存在性,从而首次为推理链压缩建立了可测量的下界。基于对主流思维链压缩策略的分析,我们识别出使模型能够逼近MSL的关键结构因素。基于这些洞见,我们提出了TRiMS,该框架在训练过程中采用GRPO算法并结合基于MSL的估计,同时通过动态批次聚合和基于批次级标准差计算优势值来缓解训练过程中的不稳定性。TRiMS在所有基准测试中实现了超过80%的思维链令牌缩减,同时准确率略有提升。