Emotional Support Conversation requires not only affective expression but also grounded instrumental support to provide trustworthy guidance. However, existing ESC systems and benchmarks largely focus on affective support in text-only settings, overlooking how external tools can enable factual grounding and reduce hallucination in multi-turn emotional support. We introduce TEA-Bench, the first interactive benchmark for evaluating tool-augmented agents in ESC, featuring realistic emotional scenarios, an MCP-style tool environment, and process-level metrics that jointly assess the quality and factual grounding of emotional support. Experiments on nine LLMs show that tool augmentation generally improves emotional support quality and reduces hallucination, but the gains are strongly capacity-dependent: stronger models use tools more selectively and effectively, while weaker models benefit only marginally. We further release TEA-Dialog, a dataset of tool-enhanced ESC dialogues, and find that supervised fine-tuning improves in-distribution support but generalizes poorly. Our results underscore the importance of tool use in building reliable emotional support agents.
翻译:情感支持对话不仅需要情感表达,还需要基于事实的工具性支持以提供可信赖的指导。然而,现有的情感支持系统与基准主要关注纯文本环境下的情感支持,忽视了外部工具如何在多轮情感支持中实现事实基础并减少幻觉。我们提出了TEA-Bench,这是首个用于评估情感支持对话中工具增强型代理的交互式基准,其特点包括真实的情感场景、MCP风格的工具环境,以及联合评估情感支持质量与事实基础的过程级指标。在九个大型语言模型上的实验表明,工具增强普遍提升了情感支持质量并减少了幻觉,但增益效果高度依赖于模型能力:更强的模型能更选择性地、更有效地使用工具,而较弱的模型仅获得边际收益。我们进一步发布了TEA-Dialog,一个包含工具增强型情感支持对话的数据集,并发现监督微调虽能提升分布内支持效果,但泛化能力较差。我们的研究结果强调了工具使用对于构建可靠情感支持代理的重要性。