Eliciting reasoning has emerged as a powerful technique for improving the performance of large language models (LLMs) on complex tasks by inducing thinking. However, their effectiveness in realistic user-engaged agent scenarios remains unclear. In this paper, we conduct a comprehensive study on the effect of explicit thinking in user-engaged LLM agents. Our experiments span across seven models, three benchmarks, and two thinking instantiations, and we evaluate them through both a quantitative response taxonomy analysis and qualitative failure propagation case studies. Contrary to expectations, we find that mandatory thinking often backfires on agents in user-engaged settings, causing anomalous performance degradation across various LLMs. Our key finding reveals that thinking makes agents more ``introverted'' by shortening responses and reducing information disclosure to users, which weakens agent-user information exchange and leads to downstream task failures. Furthermore, we demonstrate that explicitly prompting for information disclosure reliably improves performance across diverse model families, suggesting that proactive transparency is a vital lever for agent optimization. Overall, our study suggests that information transparency awareness is a crucial yet underexplored perspective for the future design of reasoning agents in real-world scenarios. Our code is available at https://github.com/deeplearning-wisc/Thinking-Agent.
翻译:通过诱导思考来激发推理已成为提升大型语言模型(LLM)在复杂任务上性能的强大技术。然而,其在现实用户参与型代理场景中的有效性尚不明确。本文对用户参与的LLM代理中显式思考的效果进行了全面研究。我们的实验涵盖七个模型、三个基准测试和两种思考实例化方式,并通过定量响应分类分析和定性故障传播案例研究进行评估。与预期相反,我们发现强制思考在用户参与型设置中往往对代理产生反作用,导致不同LLM出现异常性能下降。我们的关键发现表明,思考会使代理变得更加“内向”,表现为缩短响应并减少向用户披露的信息,从而削弱代理与用户之间的信息交换,并导致下游任务失败。此外,我们证明显式提示信息披露能够可靠地提升跨不同模型族的性能,这表明主动透明性是代理优化的关键杠杆。总体而言,我们的研究表明,信息透明性意识是未来现实场景中推理代理设计至关重要但尚未充分探索的视角。我们的代码可在 https://github.com/deeplearning-wisc/Thinking-Agent 获取。