Language Reasoning Models (LRMs) achieve strong performance by scaling test-time computation but often suffer from ``overthinking'', producing excessively long reasoning traces that increase latency and memory usage. Existing LRMs typically enforce conciseness with uniform length penalties, which over-compress crucial early deduction steps at the sequence level and indiscriminately penalize all queries at the group level. To solve these limitations, we propose \textbf{\model}, a dual-level framework for prefix-protected and difficulty-aware compression under hierarchical supervision. At the sequence level, prefix-protected optimization employs decaying mixed rollouts to maintain valid reasoning paths while promoting conciseness. At the group level, difficulty-aware penalty dynamically scales length constraints based on query complexity, maintaining exploration for harder questions while curbing redundancy on easier ones. Extensive experiments on DeepSeek-R1-Distill-Qwen (1.5B/7B) demonstrate that \model achieves a substantial reduction in token usage (up to \textbf{55.7\%}) while simultaneously improving accuracy (up to \textbf{4.1\%}) on math benchmarks, with generalization ability to code, science, and general domains.
翻译:语言推理模型通过扩展测试时计算实现强大性能,但常受“过度思考”问题困扰,产生过长的推理轨迹,导致延迟增加与内存占用上升。现有模型通常采用统一长度惩罚强制简洁性,这在序列层面过度压缩关键的前期推导步骤,在组层面则不加区分地惩罚所有查询。为解决这些局限,我们提出\textbf{PACE}——一种在分层监督下实现前缀保护与难度感知压缩的双层框架。在序列层面,前缀保护优化采用衰减混合推演策略,在保持有效推理路径的同时促进简洁性。在组层面,难度感知惩罚根据查询复杂度动态调整长度约束,在维持难题探索能力的同时抑制简单问题的冗余输出。基于DeepSeek-R1-Distill-Qwen(1.5B/7B)的广泛实验表明,PACE在数学基准测试中实现了显著的令牌使用量降低(最高达\textbf{55.7\%}),同时准确率提升最高达\textbf{4.1\%},并展现出对代码、科学及通用领域的泛化能力。