The proliferation of memes on social media necessitates the capabilities of multimodal Large Language Models (mLLMs) to effectively understand multimodal harmfulness. Existing evaluation approaches predominantly focus on mLLMs' detection accuracy for binary classification tasks, which often fail to reflect the in-depth interpretive nuance of harmfulness across diverse contexts. In this paper, we propose MemeArena, an agent-based arena-style evaluation framework that provides a context-aware and unbiased assessment for mLLMs' understanding of multimodal harmfulness. Specifically, MemeArena simulates diverse interpretive contexts to formulate evaluation tasks that elicit perspective-specific analyses from mLLMs. By integrating varied viewpoints and reaching consensus among evaluators, it enables fair and unbiased comparisons of mLLMs' abilities to interpret multimodal harmfulness. Extensive experiments demonstrate that our framework effectively reduces the evaluation biases of judge agents, with judgment results closely aligning with human preferences, offering valuable insights into reliable and comprehensive mLLM evaluations in multimodal harmfulness understanding. Our code and data are publicly available at https://github.com/Lbotirx/MemeArena.
翻译:社交媒体中模因(meme)的广泛传播要求多模态大语言模型(mLLMs)具备有效理解多模态有害内容的能力。现有评估方法主要关注mLLMs在二元分类任务中的检测准确率,往往难以反映模型在不同上下文中有害性理解的深度解释细微差异。本文提出MemeArena,一种基于智能体的竞技场式评估框架,为mLLMs的多模态有害性理解提供上下文感知且无偏的评估。具体而言,MemeArena通过模拟多样化解释上下文构建评估任务,激发mLLMs从特定视角进行分析。通过整合多元观点并在评估者间达成共识,该框架实现了对mLLMs解释多模态有害性能力的公平无偏比较。大量实验表明,我们的框架能有效降低评估智能体的判断偏差,其评判结果与人类偏好高度一致,为多模态有害性理解领域可靠且全面的mLLM评估提供了重要洞见。我们的代码与数据已公开于https://github.com/Lbotirx/MemeArena。