TEA-Bench: A Systematic Benchmarking of Tool-enhanced Emotional Support Dialogue Agent - 专知论文

会员服务 ·

0

工具 · TEA · 增强型 · 基准 · 系统 ·

TEA-Bench: A Systematic Benchmarking of Tool-enhanced Emotional Support Dialogue Agent

翻译：TEA-Bench：工具增强型情感支持对话代理的系统性基准评测

Xingyu Sui,Yanyan Zhao,Yulin Hu,Jiahe Guo,Weixiang Zhao,Bing Qin

Emotional Support Conversation requires not only affective expression but also grounded instrumental support to provide trustworthy guidance. However, existing ESC systems and benchmarks largely focus on affective support in text-only settings, overlooking how external tools can enable factual grounding and reduce hallucination in multi-turn emotional support. We introduce TEA-Bench, the first interactive benchmark for evaluating tool-augmented agents in ESC, featuring realistic emotional scenarios, an MCP-style tool environment, and process-level metrics that jointly assess the quality and factual grounding of emotional support. Experiments on nine LLMs show that tool augmentation generally improves emotional support quality and reduces hallucination, but the gains are strongly capacity-dependent: stronger models use tools more selectively and effectively, while weaker models benefit only marginally. We further release TEA-Dialog, a dataset of tool-enhanced ESC dialogues, and find that supervised fine-tuning improves in-distribution support but generalizes poorly. Our results underscore the importance of tool use in building reliable emotional support agents.

翻译：情感支持对话不仅需要情感表达，还需要基于事实的工具性支持以提供可信赖的指导。然而，现有的情感支持系统与基准主要关注纯文本环境下的情感支持，忽视了外部工具如何在多轮情感支持中实现事实基础并减少幻觉。我们提出了TEA-Bench，这是首个用于评估情感支持对话中工具增强型代理的交互式基准，其特点包括真实的情感场景、MCP风格的工具环境，以及联合评估情感支持质量与事实基础的过程级指标。在九个大型语言模型上的实验表明，工具增强普遍提升了情感支持质量并减少了幻觉，但增益效果高度依赖于模型能力：更强的模型能更选择性地、更有效地使用工具，而较弱的模型仅获得边际收益。我们进一步发布了TEA-Dialog，一个包含工具增强型情感支持对话的数据集，并发现监督微调虽能提升分布内支持效果，但泛化能力较差。我们的研究结果强调了工具使用对于构建可靠情感支持代理的重要性。

0

相关内容

情感推荐系统综述：面向个性化的态度、情绪与情境建模

情感推荐系统综述：面向个性化的态度、情绪与情境建模

专知会员服务

17+阅读 · 2025年8月29日

多模态对话情感识别：方法、趋势、挑战与前景综述

多模态对话情感识别：方法、趋势、挑战与前景综述

专知会员服务

20+阅读 · 2025年5月28日

《知识表示工具在感知支持系统中的应用》加拿大国防研究与发展部（DRDC）

《知识表示工具在感知支持系统中的应用》加拿大国防研究与发展部（DRDC）

专知会员服务

33+阅读 · 2022年8月26日

【复旦大学等】情感计算的系统综述:情感模型、数据库及研究进展，A Systematic Review on Affective Computing: Emotion Models, Databases, and Recent Advances

【复旦大学等】情感计算的系统综述:情感模型、数据库及研究进展，A Systematic Review on Affective Computing: Emotion Models, Databases, and Recent Advances

专知会员服务

55+阅读 · 2022年3月17日

香港中文大学最新《基于Aspect的情感分析》综述论文，涵盖近200篇文献阐述ABSA方法体系

香港中文大学最新《基于Aspect的情感分析》综述论文，涵盖近200篇文献阐述ABSA方法体系

专知会员服务

44+阅读 · 2022年3月3日

【香港中文大学】基于Aspect的情感分析综述论文，A Survey on Aspect-Based Sentiment Analysis: Tasks, Methods, and Challenges

【香港中文大学】基于Aspect的情感分析综述论文，A Survey on Aspect-Based Sentiment Analysis: Tasks, Methods, and Challenges

专知会员服务

20+阅读 · 2022年3月3日

文本情感对话系统研究综述

专知会员服务

74+阅读 · 2021年5月21日

【ECML-PKDD 2019】终身PU学习在情感分析中的解构面与观点词（Disentangling Aspect and Opinion Words inSentiment Analysis using Lifelong PU Learning）

【ECML-PKDD 2019】终身PU学习在情感分析中的解构面与观点词（Disentangling Aspect and Opinion Words inSentiment Analysis using Lifelong PU Learning）

专知会员服务

16+阅读 · 2019年12月3日

【NLPCC2019 Tutorial】个性化推荐的基础与趋势（Foundations and Trends for Personalized Recommendation）附145页ppt，清华大学张敏老师

【NLPCC2019 Tutorial】个性化推荐的基础与趋势（Foundations and Trends for Personalized Recommendation）附145页ppt，清华大学张敏老师

专知会员服务

68+阅读 · 2019年11月22日

【AAAI2020论文-清华大学】基于人物稀疏数据的预训练个性化对话生成模型（A Pre-training Based Personalized Dialogue Generation Model with Persona-sparse Data）

【AAAI2020论文-清华大学】基于人物稀疏数据的预训练个性化对话生成模型（A Pre-training Based Personalized Dialogue Generation Model with Persona-sparse Data）

专知会员服务

29+阅读 · 2019年11月15日

揭秘ChatGPT情感对话能力

揭秘ChatGPT情感对话能力

专知

16+阅读 · 2023年4月9日

【论文分享】ACL 2020 细粒度情感分析方法

【论文分享】ACL 2020 细粒度情感分析方法

深度学习自然语言处理

10+阅读 · 2020年8月20日

我用NLP搞定了文本情感分析，还学会了基于注意力的Transformer

我用NLP搞定了文本情感分析，还学会了基于注意力的Transformer

CVer

15+阅读 · 2020年7月27日

让人工智能有情感的秘诀！清华权威报告看透情感计算【附下载】

让人工智能有情感的秘诀！清华权威报告看透情感计算【附下载】

人工智能学家

21+阅读 · 2019年10月7日

语音情绪识别|声源增强|基频可视化

语音情绪识别|声源增强|基频可视化

深度学习每日摘要

15+阅读 · 2019年5月5日

【资源推荐】情感分析资源列表

【资源推荐】情感分析资源列表

专知

31+阅读 · 2019年3月20日

知识在检索式对话系统的应用

知识在检索式对话系统的应用

微信AI

32+阅读 · 2018年9月20日

【论文推荐】最新六篇情感分析相关论文—深度上下文、支持向量机、两级LSTM、多模态情感分析、软件工程、代码混合

【论文推荐】最新六篇情感分析相关论文—深度上下文、支持向量机、两级LSTM、多模态情感分析、软件工程、代码混合

专知

24+阅读 · 2018年3月31日

现代情感分析方法

现代情感分析方法

算法与数学之美

14+阅读 · 2018年1月12日

情感分析的新方法，使用word2vec对微博文本进行情感分析和分类

情感分析的新方法，使用word2vec对微博文本进行情感分析和分类

数据挖掘入门与实战

22+阅读 · 2018年1月6日

基于多源语义表示学习的社交媒体文本属性情感分类研究

国家自然科学基金

4+阅读 · 2017年12月31日

短文本情感分析关键技术研究

国家自然科学基金

9+阅读 · 2015年12月31日

情绪影响人际信任的效应与机制研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于人脸表情、身体姿态和语音的多模态情感识别方法研究

国家自然科学基金

10+阅读 · 2015年12月31日

移动社会网络中情境感知的多维个性化信任评价研究

国家自然科学基金

2+阅读 · 2015年12月31日

读者视角的跨领域隐式情感分析理论及关键技术研究

国家自然科学基金

1+阅读 · 2015年12月31日

阈下情绪启动影响正常人及分裂型特质个体情绪判断的神经机制

国家自然科学基金

0+阅读 · 2015年12月31日

中文社交化短文本情感分析与话题挖掘研究

国家自然科学基金

3+阅读 · 2015年12月31日

面向情感认知的产品造型特征与用户意象需求层次映射机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于多模式情感测量的考虑多维设计特征的产品外观情感设计研究

国家自然科学基金

1+阅读 · 2014年12月31日

Affective Flow Language Model for Emotional Support Conversation

Arxiv

0+阅读 · 2月9日

VCB Bench: An Evaluation Benchmark for Audio-Grounded Large Language Model Conversational Agents

Arxiv

0+阅读 · 2月9日

Speech Emotion Recognition Leveraging OpenAI's Whisper Representations and Attentive Pooling Methods

Speech Emotion Recognition Leveraging OpenAI's Whisper Representations and Attentive Pooling Methods

Arxiv

0+阅读 · 2月5日

ES-MemEval: Benchmarking Conversational Agents on Personalized Long-Term Emotional Support

Arxiv

0+阅读 · 2月2日

EmoShift: Lightweight Activation Steering for Enhanced Emotion-Aware Speech Synthesis

Arxiv

0+阅读 · 1月30日

CARE: Cognitive-reasoning Augmented Reinforcement for Emotional Support Conversation

Arxiv

0+阅读 · 1月23日

OP-Bench: Benchmarking Over-Personalization for Memory-Augmented Personalized Conversational Agents

Arxiv

0+阅读 · 1月20日

Tears or Cheers? Benchmarking LLMs via Culturally Elicited Distinct Affective Responses

Arxiv

0+阅读 · 1月19日

DecoupledESC: Enhancing Emotional Support Generation via Strategy-Response Decoupled Preference Optimization

Arxiv

0+阅读 · 1月16日

TeachPro: Multi-Label Qualitative Teaching Evaluation via Cross-View Graph Synergy and Semantic Anchored Evidence Encoding

Arxiv

0+阅读 · 1月14日

VIP会员

文章信息

相关主题

最新内容

《COOL模型（行动循环圈）：军事领导体系中的战役层级变革流程》

《COOL模型（行动循环圈）：军事领导体系中的战役层级变革流程》

专知会员服务

7+阅读 · 4月20日

《系统簇式多域作战规划范畴论框架》

《系统簇式多域作战规划范畴论框架》

专知会员服务

4+阅读 · 4月20日

《美国防部指令6130.03，第2卷服役医疗标准：保留》

《美国防部指令6130.03，第2卷服役医疗标准：保留》

专知会员服务

3+阅读 · 4月20日

《美国防部指令6130.03，第1卷服役医疗标准：任命、征募或征召》

《美国防部指令6130.03，第1卷服役医疗标准：任命、征募或征召》

专知会员服务

2+阅读 · 4月20日

美空军“战场机载通信节点（BACN）”：美以对伊空战行动中隐形却关键的一环

美空军“战场机载通信节点（BACN）”：美以对伊空战行动中隐形却关键的一环

专知会员服务

3+阅读 · 4月20日

【CMU博士论文】面向非结构化环境下医疗急救的具身人工智能

【CMU博士论文】面向非结构化环境下医疗急救的具身人工智能

专知会员服务

2+阅读 · 4月20日

高效视频扩散模型：进展与挑战

高效视频扩散模型：进展与挑战

专知会员服务

2+阅读 · 4月20日

乌克兰前线的五项创新

乌克兰前线的五项创新

专知会员服务

7+阅读 · 4月20日

军事通信系统与设备的技术演进综述

军事通信系统与设备的技术演进综述

专知会员服务

5+阅读 · 4月20日

《北约 AI手册：作战人员的实用考量》（2026最新64页）

《北约 AI手册：作战人员的实用考量》（2026最新64页）

专知会员服务

9+阅读 · 4月20日

《北约标准：医疗评估手册》174页

《北约标准：医疗评估手册》174页

专知会员服务

5+阅读 · 4月20日

《提升生成模型的安全性与保障》博士论文

《提升生成模型的安全性与保障》博士论文

专知会员服务

5+阅读 · 4月20日

美国当前高超音速导弹发展概述

美国当前高超音速导弹发展概述

专知会员服务

4+阅读 · 4月19日

《高超音速武器：一项再度兴起的技术》120页slides

《高超音速武器：一项再度兴起的技术》120页slides

专知会员服务

14+阅读 · 4月19日

无人机蜂群建模与仿真方法

无人机蜂群建模与仿真方法

专知会员服务

14+阅读 · 4月19日

相关VIP内容

情感推荐系统综述：面向个性化的态度、情绪与情境建模

情感推荐系统综述：面向个性化的态度、情绪与情境建模

专知会员服务

17+阅读 · 2025年8月29日

多模态对话情感识别：方法、趋势、挑战与前景综述

多模态对话情感识别：方法、趋势、挑战与前景综述

专知会员服务

20+阅读 · 2025年5月28日

《知识表示工具在感知支持系统中的应用》加拿大国防研究与发展部（DRDC）

《知识表示工具在感知支持系统中的应用》加拿大国防研究与发展部（DRDC）

专知会员服务

33+阅读 · 2022年8月26日

【复旦大学等】情感计算的系统综述:情感模型、数据库及研究进展，A Systematic Review on Affective Computing: Emotion Models, Databases, and Recent Advances

【复旦大学等】情感计算的系统综述:情感模型、数据库及研究进展，A Systematic Review on Affective Computing: Emotion Models, Databases, and Recent Advances

专知会员服务

55+阅读 · 2022年3月17日

香港中文大学最新《基于Aspect的情感分析》综述论文，涵盖近200篇文献阐述ABSA方法体系

香港中文大学最新《基于Aspect的情感分析》综述论文，涵盖近200篇文献阐述ABSA方法体系

专知会员服务

44+阅读 · 2022年3月3日

【香港中文大学】基于Aspect的情感分析综述论文，A Survey on Aspect-Based Sentiment Analysis: Tasks, Methods, and Challenges

【香港中文大学】基于Aspect的情感分析综述论文，A Survey on Aspect-Based Sentiment Analysis: Tasks, Methods, and Challenges

专知会员服务

20+阅读 · 2022年3月3日

文本情感对话系统研究综述

专知会员服务

74+阅读 · 2021年5月21日

【ECML-PKDD 2019】终身PU学习在情感分析中的解构面与观点词（Disentangling Aspect and Opinion Words inSentiment Analysis using Lifelong PU Learning）

【ECML-PKDD 2019】终身PU学习在情感分析中的解构面与观点词（Disentangling Aspect and Opinion Words inSentiment Analysis using Lifelong PU Learning）

专知会员服务

16+阅读 · 2019年12月3日

【NLPCC2019 Tutorial】个性化推荐的基础与趋势（Foundations and Trends for Personalized Recommendation）附145页ppt，清华大学张敏老师

【NLPCC2019 Tutorial】个性化推荐的基础与趋势（Foundations and Trends for Personalized Recommendation）附145页ppt，清华大学张敏老师

专知会员服务

68+阅读 · 2019年11月22日

【AAAI2020论文-清华大学】基于人物稀疏数据的预训练个性化对话生成模型（A Pre-training Based Personalized Dialogue Generation Model with Persona-sparse Data）

【AAAI2020论文-清华大学】基于人物稀疏数据的预训练个性化对话生成模型（A Pre-training Based Personalized Dialogue Generation Model with Persona-sparse Data）

专知会员服务

29+阅读 · 2019年11月15日

热门VIP内容

开通专知VIP会员享更多权益服务

《系统簇式多域作战规划范畴论框架》

《美国防部指令6130.03，第1卷服役医疗标准：任命、征募或征召》

《COOL模型（行动循环圈）：军事领导体系中的战役层级变革流程》

《美国防部指令6130.03，第2卷服役医疗标准：保留》

相关资讯

揭秘ChatGPT情感对话能力

揭秘ChatGPT情感对话能力

专知

16+阅读 · 2023年4月9日

【论文分享】ACL 2020 细粒度情感分析方法

【论文分享】ACL 2020 细粒度情感分析方法

深度学习自然语言处理

10+阅读 · 2020年8月20日

我用NLP搞定了文本情感分析，还学会了基于注意力的Transformer

我用NLP搞定了文本情感分析，还学会了基于注意力的Transformer

CVer

15+阅读 · 2020年7月27日

让人工智能有情感的秘诀！清华权威报告看透情感计算【附下载】

让人工智能有情感的秘诀！清华权威报告看透情感计算【附下载】

人工智能学家

21+阅读 · 2019年10月7日

语音情绪识别|声源增强|基频可视化

语音情绪识别|声源增强|基频可视化

深度学习每日摘要

15+阅读 · 2019年5月5日

【资源推荐】情感分析资源列表

【资源推荐】情感分析资源列表

专知

31+阅读 · 2019年3月20日

知识在检索式对话系统的应用

知识在检索式对话系统的应用

微信AI

32+阅读 · 2018年9月20日

【论文推荐】最新六篇情感分析相关论文—深度上下文、支持向量机、两级LSTM、多模态情感分析、软件工程、代码混合

【论文推荐】最新六篇情感分析相关论文—深度上下文、支持向量机、两级LSTM、多模态情感分析、软件工程、代码混合

专知

24+阅读 · 2018年3月31日

现代情感分析方法

现代情感分析方法

算法与数学之美

14+阅读 · 2018年1月12日

情感分析的新方法，使用word2vec对微博文本进行情感分析和分类

情感分析的新方法，使用word2vec对微博文本进行情感分析和分类

数据挖掘入门与实战

22+阅读 · 2018年1月6日

相关论文

Affective Flow Language Model for Emotional Support Conversation

Arxiv

0+阅读 · 2月9日

VCB Bench: An Evaluation Benchmark for Audio-Grounded Large Language Model Conversational Agents

Arxiv

0+阅读 · 2月9日

Speech Emotion Recognition Leveraging OpenAI's Whisper Representations and Attentive Pooling Methods

Speech Emotion Recognition Leveraging OpenAI's Whisper Representations and Attentive Pooling Methods

Arxiv

0+阅读 · 2月5日

ES-MemEval: Benchmarking Conversational Agents on Personalized Long-Term Emotional Support

Arxiv

0+阅读 · 2月2日

EmoShift: Lightweight Activation Steering for Enhanced Emotion-Aware Speech Synthesis

Arxiv

0+阅读 · 1月30日

CARE: Cognitive-reasoning Augmented Reinforcement for Emotional Support Conversation

Arxiv

0+阅读 · 1月23日

OP-Bench: Benchmarking Over-Personalization for Memory-Augmented Personalized Conversational Agents

Arxiv

0+阅读 · 1月20日

Tears or Cheers? Benchmarking LLMs via Culturally Elicited Distinct Affective Responses

Arxiv

0+阅读 · 1月19日

DecoupledESC: Enhancing Emotional Support Generation via Strategy-Response Decoupled Preference Optimization

Arxiv

0+阅读 · 1月16日

TeachPro: Multi-Label Qualitative Teaching Evaluation via Cross-View Graph Synergy and Semantic Anchored Evidence Encoding

Arxiv

0+阅读 · 1月14日

相关基金

基于多源语义表示学习的社交媒体文本属性情感分类研究

国家自然科学基金

4+阅读 · 2017年12月31日

短文本情感分析关键技术研究

国家自然科学基金

9+阅读 · 2015年12月31日

情绪影响人际信任的效应与机制研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于人脸表情、身体姿态和语音的多模态情感识别方法研究

国家自然科学基金

10+阅读 · 2015年12月31日

移动社会网络中情境感知的多维个性化信任评价研究

国家自然科学基金

2+阅读 · 2015年12月31日

读者视角的跨领域隐式情感分析理论及关键技术研究

国家自然科学基金

1+阅读 · 2015年12月31日

阈下情绪启动影响正常人及分裂型特质个体情绪判断的神经机制

国家自然科学基金

0+阅读 · 2015年12月31日

中文社交化短文本情感分析与话题挖掘研究

国家自然科学基金

3+阅读 · 2015年12月31日

面向情感认知的产品造型特征与用户意象需求层次映射机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于多模式情感测量的考虑多维设计特征的产品外观情感设计研究

国家自然科学基金

1+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员