Social Human Robot Embodied Conversation (SHREC) Dataset: Benchmarking Foundational Models' Social Reasoning

Our work focuses on the social reasoning capabilities of foundation models for real-world human-robot interactions. We introduce the Social Human Robot Embodied Conversation (SHREC) Dataset, a benchmark of $\sim$400 real-world human-robot interaction videos and over 10K annotations, capturing robot social errors, competencies, underlying rationales, and corrections. Unlike prior datasets focused on human-human interactions, the SHREC Dataset uniquely highlights the social challenges faced by real-world social robots such as emotion understanding, intention tracking, and conversational mechanics. Moreover, current foundation models struggle to recognize these deficits, which manifest as subtle, socially situated failures. To evaluate AI models' capacity for social reasoning, we define eight benchmark tasks targeting critical areas such as (1) detection of social errors and competencies, (2) identification of underlying social attributes, (3) comprehension of interaction flow, and (4) providing rationale and alternative correct actions. Experiments with state-of-the-art foundation models, alongside human evaluations, reveal substantial performance gaps -- underscoring the difficulty and providing directions in developing socially intelligent AI.

翻译：本研究聚焦于基础模型在真实人机交互场景中的社交推理能力。我们提出了社会人机具身对话（SHREC）数据集，该基准包含约400个真实人机交互视频及超过10,000条标注，涵盖机器人的社交错误、胜任能力、潜在动机与修正策略。与以往聚焦人人交互的数据集不同，SHREC数据集独特地凸显了现实社交机器人面临的情绪理解、意图追踪及对话机制等社会性挑战。此外，当前基础模型难以识别这些表现为细微社会情境化失败的缺陷。为评估人工智能模型的社交推理能力，我们定义了八项基准任务，聚焦四大关键领域：（1）社交错误与胜任能力检测，（2）潜在社交属性识别，（3）交互流程理解，以及（4）提供动机与替代正确动作。基于最先进基础模型的实验与人类评估结果揭示了显著的性能差距——这既突显了挑战的艰巨性，也为发展具有社交智能的人工智能指明了方向。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《面向道义推理DevOps的人类-人工智能交流（CODORD）》DARPA项目slides

专知会员服务

22+阅读 · 2024年11月18日

如何评估具身智能？斯坦福李飞飞等发布《BEHAVIOR-1K: 以人为中心、具身化AI基准测试，含1000种日常活动和真实模拟》

专知会员服务

62+阅读 · 2024年3月15日

100多位作者！具身智能人进展！谷歌 DeepMind等机构推出《开放 X-实体化：机器人学习数据集与 RT-X 模型》论文

专知会员服务

60+阅读 · 2023年10月10日

揭秘ChatGPT情感对话能力

专知会员服务

59+阅读 · 2023年4月9日