With the integration of Multimodal large language models (MLLMs) into robotic systems and various AI applications, embedding emotional intelligence (EI) capabilities into these models is essential for enabling robots to effectively address human emotional needs and interact seamlessly in real-world scenarios. Existing static, text-based, or text-image benchmarks overlook the multimodal complexities of real-world interactions and fail to capture the dynamic, multimodal nature of emotional expressions, making them inadequate for evaluating MLLMs' EI. Based on established psychological theories of EI, we build EmoBench-M, a novel benchmark designed to evaluate the EI capability of MLLMs across 13 valuation scenarios from three key dimensions: foundational emotion recognition, conversational emotion understanding, and socially complex emotion analysis. Evaluations of both open-source and closed-source MLLMs on EmoBench-M reveal a significant performance gap between them and humans, highlighting the need to further advance their EI capabilities. All benchmark resources, including code and datasets, are publicly available at https://emo-gml.github.io/.
翻译:随着多模态大语言模型(MLLMs)被集成到机器人系统及各类人工智能应用中,为这些模型嵌入情商(EI)能力对于使机器人能够有效应对人类情感需求并在现实场景中实现无缝交互至关重要。现有的静态、基于文本或文本-图像的基准测试忽视了现实世界交互的多模态复杂性,未能捕捉情感表达的动态多模态本质,因此不足以评估MLLMs的情商。基于成熟的情商心理学理论,我们构建了EmoBench-M——一个旨在从三个关键维度(基础情感识别、对话情感理解以及社会复杂性情感分析)的13个评估场景中全面评估MLLMs情商能力的新型基准。在EmoBench-M上对开源与闭源MLLMs的评估结果显示,它们与人类之间存在显著的性能差距,这凸显了进一步提升其情商能力的必要性。所有基准资源,包括代码与数据集,均公开于 https://emo-gml.github.io/。