Large language model (LLM) agents are increasingly deployed in multi-agent systems where they must coordinate and agree on shared decisions. We ask whether classical resilient consensus theory, developed for deterministic agents, transfers to LLM agents that may behave adversarially. Framing LLM agreement as a Byzantine consensus game, we run controlled experiments on complete and general communication graphs. We find that prompted LLM agents fail to reach agreement that is achievable in principle: consensus can fail even in settings where classical theory guarantees that a convergent algorithm exists, and this failure persists across temperatures and horizons. At the same time, wrapping the agents with classical resilient consensus filters improves agreement. The benefit of filtering depends on how much robustness the underlying topology already provides. Our results suggest that classical resilient consensus theory is a useful lens for the safety of agentic AI.
翻译:大语言模型(LLM)智能体正日益部署于多智能体系统中,它们需要协调并达成共享决策。我们探究了为确定性智能体开发的经典弹性共识理论,是否适用于可能表现出对抗性行为的LLM智能体。将LLM共识问题建模为拜占庭共识博弈,我们在完全图与一般通信图上开展了受控实验。研究发现,即使经过提示,LLM智能体仍无法达成理论上可实现的共识:即使在经典理论保证收敛算法存在的场景中,共识仍可能失败,且这种失败在不同温度参数与时间范围内持续存在。同时,为智能体包裹经典弹性共识滤波器可改善共识达成率,但其改善效果取决于底层拓扑结构已提供的鲁棒性程度。我们的研究结果表明,经典弹性共识理论可为自主智能体的安全性提供有益视角。