MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation

Haochen Xue,Feilong Tang,Ming Hu,Yexin Liu,Qidong Huang,Yulong Li,Chengzhi Liu,Zhongxing Xu,Chong Zhang,Chun-Mei Feng,Yutong Xie,Imran Razzak,Zongyuan Ge,Jionglong Su,Junjun He,Yu Qiao

Recent multimodal large language models (MLLMs) have demonstrated significant potential in open-ended conversation, generating more accurate and personalized responses. However, their abilities to memorize, recall, and reason in sustained interactions within real-world scenarios remain underexplored. This paper introduces MMRC, a Multi-Modal Real-world Conversation benchmark for evaluating six core open-ended abilities of MLLMs: information extraction, multi-turn reasoning, information update, image management, memory recall, and answer refusal. With data collected from real-world scenarios, MMRC comprises 5,120 conversations and 28,720 corresponding manually labeled questions, posing a significant challenge to existing MLLMs. Evaluations on 20 MLLMs in MMRC indicate an accuracy drop during open-ended interactions. We identify four common failure patterns: long-term memory degradation, inadequacies in updating factual knowledge, accumulated assumption of error propagation, and reluctance to say no. To mitigate these issues, we propose a simple yet effective NOTE-TAKING strategy, which can record key information from the conversation and remind the model during its responses, enhancing conversational capabilities. Experiments across six MLLMs demonstrate significant performance improvements.

翻译：近年来，多模态大语言模型在开放式对话中展现出巨大潜力，能够生成更准确和个性化的回答。然而，它们在真实世界场景的持续交互中所具备的记忆、回忆和推理能力仍未得到充分探索。本文提出了MMRC，一个用于评估多模态大语言模型六项核心开放式能力的多模态真实世界对话基准：信息提取、多轮推理、信息更新、图像管理、记忆回忆以及答案拒绝。基于真实世界场景收集的数据，MMRC包含5,120个对话和28,720个对应的人工标注问题，对现有多模态大语言模型构成了重大挑战。在MMRC上对20个多模态大语言模型的评估表明，其在开放式交互中的准确率有所下降。我们识别出四种常见的失败模式：长期记忆衰退、事实知识更新不足、错误传播的累积假设以及不愿拒绝回答。为缓解这些问题，我们提出了一种简单而有效的NOTE-TAKING策略，该策略能够记录对话中的关键信息并在模型生成回答时予以提示，从而提升其对话能力。在六个多模态大语言模型上的实验证明了显著的性能提升。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/