Chain-of-thought (CoT) prompting improves LLM reasoning but incurs high latency and memory cost due to verbose traces, motivating CoT compression with preserved correctness. Existing methods either shorten CoTs at the semantic level, which is often conservative, or prune tokens aggressively, which can miss task-critical cues and degrade accuracy. Moreover, combining the two is non-trivial due to sequential dependency, task-agnostic pruning, and distribution mismatch. We propose \textbf{CtrlCoT}, a dual-granularity CoT compression framework that harmonizes semantic abstraction and token-level pruning through three components: Hierarchical Reasoning Abstraction produces CoTs at multiple semantic granularities; Logic-Preserving Distillation trains a logic-aware pruner to retain indispensable reasoning cues (e.g., numbers and operators) across pruning ratios; and Distribution-Alignment Generation aligns compressed traces with fluent inference-time reasoning styles to avoid fragmentation. On MATH-500 with Qwen2.5-7B-Instruct, CtrlCoT uses 30.7\% fewer tokens while achieving 7.6 percentage points higher than the strongest baseline, demonstrating more efficient and reliable reasoning. Our code will be publicly available at https://github.com/fanzhenxuan/Ctrl-CoT.
翻译:思维链(CoT)提示方法虽能提升大语言模型的推理能力,但其冗长的推理轨迹会导致高延迟和高内存开销,这促使研究者探索在保持正确性的前提下对CoT进行压缩。现有方法要么在语义层面进行压缩(通常较为保守),要么对令牌进行激进剪枝(可能丢失任务关键线索并降低准确性)。此外,由于存在顺序依赖、任务无关的剪枝以及分布不匹配等问题,将两者有效结合并非易事。本文提出 \textbf{CtrlCoT},一种双粒度CoT压缩框架,它通过三个组件协调语义抽象与令牌级剪枝:分层推理抽象生成多语义粒度的CoT;逻辑保持蒸馏训练一个逻辑感知的剪枝器,以在不同剪枝比例下保留不可或缺的推理线索(如数字和运算符);分布对齐生成使压缩后的推理轨迹与流畅的推理时风格对齐,避免碎片化。在MATH-500数据集上使用Qwen2.5-7B-Instruct模型进行实验,CtrlCoT在比最强基线准确率高出7.6个百分点的同时,减少了30.7%的令牌使用量,证明了其更高效、更可靠的推理能力。我们的代码将在 https://github.com/fanzhenxuan/Ctrl-CoT 公开。