We present ThinkPrune, a simple yet effective method for pruning the thinking length for long-thinking LLMs, which has been found to often produce inefficient and redundant thinking processes. Existing preliminary explorations of reducing thinking length primarily focus on forcing the thinking process to early exit, rather than adapting the LLM to optimize and consolidate the thinking process, and therefore the length-performance tradeoff observed so far is sub-optimal. To fill this gap, ThinkPrune offers a simple solution that continuously trains the long-thinking LLMs via reinforcement learning (RL) with an added token limit, beyond which any unfinished thoughts and answers will be discarded, resulting in a zero reward. To further preserve model performance, we introduce an iterative length pruning approach, where multiple rounds of RL are conducted, each with an increasingly more stringent token limit. We observed that ThinkPrune results in a remarkable performance-length tradeoff -- on the AIME24 dataset, the reasoning length of DeepSeek-R1-Distill-Qwen-1.5B can be reduced by half with only 2% drop in performance. We also observed that after pruning, the LLMs can bypass unnecessary steps while keeping the core reasoning process complete. Code is available at https://github.com/UCSB-NLP-Chang/ThinkPrune.
翻译:我们提出了ThinkPrune,一种简单而有效的方法,用于剪枝长思维大语言模型的思考长度,这类模型常产生低效冗余的思维过程。现有关于缩短思维长度的初步探索主要聚焦于强制思维过程提前退出,而非使大语言模型适应优化和整合思维过程,因此目前观察到的长度-性能权衡尚不理想。为填补这一空白,ThinkPrune提供了一种简洁的解决方案:通过强化学习持续训练长思维大语言模型,并施加额外的令牌限制,超出该限制的任何未完成思维和答案都将被丢弃,导致奖励为零。为进一步保持模型性能,我们引入了迭代式长度剪枝方法,即进行多轮强化学习,每轮采用逐步收紧的令牌限制。我们观察到ThinkPrune实现了显著的性能-长度权衡——在AIME24数据集上,DeepSeek-R1-Distill-Qwen-1.5B的推理长度可缩减一半,而性能仅下降2%。我们还发现,剪枝后的大语言模型能够绕过非必要步骤,同时保持核心推理过程的完整性。代码发布于https://github.com/UCSB-NLP-Chang/ThinkPrune。