LongEmotion：衡量大语言模型在长上下文交互中的情感智能 (LongEmotion: Measuring Emotional Intelligence of Large Language Models in Long-Context Interaction)

Large language models (LLMs) have made significant progress in Emotional Intelligence (EI) and long-context modeling. However, existing benchmarks often overlook the fact that emotional information processing unfolds as a continuous long-context process. To address the absence of multidimensional EI evaluation in long-context inference and explore model performance under more challenging conditions, we present LongEmotion, a benchmark that encompasses a diverse suite of tasks targeting the assessment of models' capabilities in Emotion Recognition, Knowledge Application, and Empathetic Generation, with an average context length of 15,341 tokens. To enhance performance under realistic constraints, we introduce the Collaborative Emotional Modeling (CoEM) framework, which integrates Retrieval-Augmented Generation (RAG) and multi-agent collaboration to improve models' EI in long-context scenarios. We conduct a detailed analysis of various models in long-context settings, investigating how reasoning mode activation, RAG-based retrieval strategies, and context-length adaptability influence their EI performance. Our project page is: https://longemotion.github.io/

翻译：大语言模型（LLM）在情感智能（EI）和长上下文建模方面取得了显著进展。然而，现有基准测试往往忽略了情感信息处理是作为一个连续的长上下文过程展开的这一事实。为弥补长上下文推理中多维EI评估的缺失，并探索模型在更具挑战性条件下的表现，我们提出了LongEmotion——一个包含多样化任务套件的基准测试，旨在评估模型在情感识别、知识应用和共情生成方面的能力，其平均上下文长度为15,341个词元。为提升模型在现实约束下的性能，我们提出了协作情感建模（CoEM）框架，该框架整合了检索增强生成（RAG）与多智能体协作机制，以增强模型在长上下文场景中的EI能力。我们对多种模型在长上下文设置下的表现进行了详细分析，探究了推理模式激活、基于RAG的检索策略以及上下文长度适应性如何影响其EI性能。我们的项目页面为：https://longemotion.github.io/