DeepCompress: A Dual Reward Strategy for Dynamically Exploring and Compressing Reasoning Chains

Large Reasoning Models (LRMs) have demonstrated impressive capabilities but suffer from cognitive inefficiencies like ``overthinking'' simple problems and ``underthinking'' complex ones. While existing methods that use supervised fine-tuning~(SFT) or reinforcement learning~(RL) with token-length rewards can improve efficiency, they often do so at the cost of accuracy. This paper introduces \textbf{DeepCompress}, a novel framework that simultaneously enhances both the accuracy and efficiency of LRMs. We challenge the prevailing approach of consistently favoring shorter reasoning paths, showing that longer responses can contain a broader range of correct solutions for difficult problems. DeepCompress employs an adaptive length reward mechanism that dynamically classifies problems as ``Simple'' or ``Hard'' in real-time based on the model's evolving capability. It encourages shorter, more efficient reasoning for ``Simple'' problems while promoting longer, more exploratory thought chains for ``Hard'' problems. This dual-reward strategy enables the model to autonomously adjust its Chain-of-Thought (CoT) length, compressing reasoning for well-mastered problems and extending it for those it finds challenging. Experimental results on challenging mathematical benchmarks show that DeepCompress consistently outperforms baseline methods, achieving superior accuracy while significantly improving token efficiency.

翻译：大型推理模型（LRMs）已展现出卓越的能力，但存在认知效率低下的问题，例如对简单问题“过度思考”和对复杂问题“思考不足”。现有方法通过监督微调（SFT）或基于令牌长度的强化学习（RL）奖励可提升效率，但往往以牺牲准确性为代价。本文提出 \textbf{DeepCompress}，一种新颖的框架，旨在同时提升LRMs的准确性与效率。我们挑战了当前普遍倾向于更短推理路径的做法，证明对于困难问题，更长的响应可能包含更广泛的正确解决方案。DeepCompress采用自适应长度奖励机制，根据模型动态演化的能力，实时将问题分类为“简单”或“困难”。它鼓励对“简单”问题采用更短、更高效的推理，同时对“困难”问题促进更长、更具探索性的思维链。这种双重奖励策略使模型能够自主调整其思维链（CoT）长度，压缩对已掌握问题的推理，并扩展对其感到挑战性问题的推理。在具有挑战性的数学基准测试上的实验结果表明，DeepCompress持续优于基线方法，在显著提升令牌效率的同时实现了更优的准确性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日