Long chain-of-thought (Long-CoT) reasoning models have motivated a growing body of work on compressing reasoning traces to reduce inference cost, yet existing evaluations focus almost exclusively on task accuracy and token savings. Trustworthiness properties, whether acquired or reinforced through post-training, are encoded in the same parameter space that compression modifies. This means preserving accuracy does not, a priori, guarantee preserving trustworthiness. We conduct the first systematic empirical study of how CoT compression affects model trustworthiness, evaluating multiple models of different scales along three dimensions: safety, hallucination resistance, and multilingual robustness. Under controlled comparisons, we find that CoT compression frequently introduces trustworthiness regressions and that different methods exhibit markedly different degradation profiles across dimensions. To enable fair comparison across bases, we propose a normalized efficiency score for each dimension that reveals how naïve scalar metrics can obscure trustworthiness trade-offs. As an existence proof, we further introduce an alignment-aware DPO variant that reduces CoT length by 19.3\% on reasoning benchmarks with substantially smaller trustworthiness loss. Our findings suggest that CoT compression should be optimized not only for efficiency but also for trustworthiness, treating both as equally important design constraints.
翻译:长链式思维(Long-CoT)推理模型推动了大量关于压缩推理痕迹以降低推理成本的研究,然而现有评估几乎完全聚焦于任务准确率和令牌节省。通过后训练获得或强化的可信赖属性编码在与压缩修改相同的参数空间中。这意味着保留准确率并不能先验地保证保留可信赖性。我们首次系统性地经验研究了CoT压缩如何影响模型可信赖性,评估了多个不同规模模型在三个维度上的表现:安全性、抗幻觉能力和多语言鲁棒性。在受控比较下,我们发现CoT压缩经常引入可信赖性退化,不同方法在各维度上表现出显著不同的退化模式。为实现基线的公平比较,我们为每个维度提出了归一化效率分数,揭示了朴素的标量指标如何掩盖可信赖性权衡。作为存在性证明,我们进一步引入了一种对齐感知的DPO变体,将推理基准上的CoT长度减少19.3%,同时可信赖性损失显著减小。我们的研究结果表明,CoT压缩不仅应针对效率进行优化,还应针对可信赖性进行优化,将两者视为同等重要的设计约束。