Reasoning has substantially improved Large Language Models (LLMs) on analytical tasks such as mathematics and code generation, but its value for abstractive summarization remains unclear. To address this gap, we adapt general reasoning strategies to the summarization setting and conduct a large-scale comparative study of 8 reasoning strategies and 3 Large Reasoning Models (LRMs) across 8 diverse datasets, evaluating both summary quality and factual faithfulness. Our results show that reasoning is not a universal solution and its effectiveness depends strongly on the strategy and the summarization setting. In particular, we find a trade-off between summary quality and factual faithfulness. Explicit reasoning strategies often improve reference-based quality, but may weaken factual grounding, whereas implicit reasoning in LRMs shows the opposite tendency. We further find that increasing an LRM's internal reasoning budget does not reliably improve summarization and can even reduce factual consistency. These findings suggest that, for summarization, more reasoning is not always better. Effective reasoning should preserve faithful compression rather than induce over-elaboration. Our source code is publicly available.
翻译:推理显著提升了大型语言模型(LLMs)在数学和代码生成等分析任务上的表现,但其对抽象式摘要的价值仍不明确。为填补这一空白,我们将通用推理策略适配到摘要场景,并开展了大规模比较研究,涵盖8种推理策略和3种大型推理模型(LRMs)在8个不同数据集上的表现,从摘要质量和事实忠实性两方面进行评估。结果表明:推理并非万能解决方案,其有效性高度依赖于具体策略和摘要场景。特别地,我们发现了摘要质量与事实忠实性之间的权衡关系——显式推理策略通常能提升基于参考的质量指标,但可能削弱事实依据;而LRM中的隐式推理则呈现相反趋势。进一步研究发现,增加LRM的内部推理预算并不能可靠地改善摘要效果,甚至可能降低事实一致性。这些发现表明,在摘要任务中,更多推理并不总是更好。有效的推理应保持忠实的压缩,而非诱发过度阐释。我们的源代码已公开。