Multi-agent systems, in which multiple large language model agents solve problems through turn-based interaction, are increasingly deployed in high-stakes settings such as medical diagnosis, legal analysis, and forensic decision-making. Their reliability can be at risk when single agents reason from incorrect or misleading context, e.g., from tool calls, since errors may propagate through agent interactions. This work studies this risk by injecting intent-based misinformation into benign single-agent and multi-agent systems across reasoning, knowledge, and alignment tasks. We find that misinformation can degrade single-agent performance and persists across multi-agent debate, with agents often retaining answers introduced by misinformed peers. Nevertheless, multi-agent debate reduces the resulting performance degradation compared to single-agent prompting, especially when most agents are not exposed to misinformation. Robustness depends on group composition and decision protocol. Consensus can be more stable than voting under peer pressure, while majorities can often steer misinformed agents back toward correct answers. Our results show that misinformation robustness in multi-agent systems depends on the underlying model and also on how agents exchange information and aggregate decisions.
翻译:多智能体系统——其中多个大语言模型通过轮次交互解决任务——正越来越多地被部署在医疗诊断、法律分析和法医决策等高风险场景中。当单个智能体基于不正确或误导性上下文(例如工具调用输出)进行推理时,错误可能通过智能体交互传播,从而危及系统可靠性。本研究通过向推理、知识和对齐任务中的良性单智能体与多智能体系统注入基于意图的错误信息,系统考察了这一风险。我们发现,错误信息会降低单智能体性能,并在多智能体辩论中持续存在——智能体往往保留由受误导同伴引入的答案。尽管如此,与单智能体提示相比,多智能体辩论能减少最终的性能下降,尤其在大部分智能体未接触错误信息时效果更为显著。鲁棒性取决于群体构成与决策协议:在同伴压力下,共识可能比投票更稳定,而多数方往往能引导受误导智能体回归正确答案。研究结果表明,多智能体系统中抵御错误信息的鲁棒性既依赖于底层模型,也取决于智能体间信息交换与决策聚合的方式。