When Agents "Misremember" Collectively: Exploring the Mandela Effect in LLM-based Multi-Agent Systems

Recent advancements in large language models (LLMs) have significantly enhanced the capabilities of collaborative multi-agent systems, enabling them to address complex challenges. However, within these multi-agent systems, the susceptibility of agents to collective cognitive biases remains an underexplored issue. A compelling example is the Mandela effect, a phenomenon where groups collectively misremember past events as a result of false details reinforced through social influence and internalized misinformation. This vulnerability limits our understanding of memory bias in multi-agent systems and raises ethical concerns about the potential spread of misinformation. In this paper, we conduct a comprehensive study on the Mandela effect in LLM-based multi-agent systems, focusing on its existence, causing factors, and mitigation strategies. We propose MANBENCH, a novel benchmark designed to evaluate agent behaviors across four common task types that are susceptible to the Mandela effect, using five interaction protocols that vary in agent roles and memory timescales. We evaluate agents powered by several LLMs on MANBENCH to quantify the Mandela effect and analyze how different factors affect it. Moreover, we propose strategies to mitigate this effect, including prompt-level defenses (e.g., cognitive anchoring and source scrutiny) and model-level alignment-based defense, achieving an average 74.40% reduction in the Mandela effect compared to the baseline. Our findings provide valuable insights for developing more resilient and ethically aligned collaborative multi-agent systems.

翻译：近年来，大语言模型（LLMs）的显著进步极大增强了协作式多智能体系统的能力，使其能够应对复杂挑战。然而，在这些多智能体系统中，智能体对集体认知偏差的易感性仍是一个尚未被充分探索的问题。一个引人注目的例子是曼德拉效应，即群体由于社会影响和内化错误信息的强化，而集体错误记忆过往事件的现象。这种脆弱性限制了我们对于多智能体系统中记忆偏差的理解，并引发了关于错误信息潜在传播的伦理担忧。本文对基于LLM的多智能体系统中的曼德拉效应进行了全面研究，重点关注其存在性、成因及缓解策略。我们提出了MANBENCH，这是一个新颖的基准测试，旨在通过五种在智能体角色和记忆时间尺度上各不相同的交互协议，评估智能体在四类易受曼德拉效应影响的常见任务类型中的行为。我们在MANBENCH上评估了由多个LLM驱动的智能体，以量化曼德拉效应并分析不同因素如何影响它。此外，我们提出了缓解该效应的策略，包括提示层面的防御（例如认知锚定和来源审查）以及基于模型对齐的模型级防御，与基线相比，平均减少了74.40%的曼德拉效应。我们的研究结果为开发更具韧性和符合伦理的协作式多智能体系统提供了宝贵的见解。