As multi-agent AI systems become increasingly autonomous, evidence shows they can develop collusive strategies similar to those long observed in human markets and institutions. While human domains have accumulated centuries of anti-collusion mechanisms, it remains unclear how these can be adapted to AI settings. This paper addresses that gap by (i) developing a taxonomy of human anti-collusion mechanisms, including sanctions, leniency & whistleblowing, monitoring & auditing, market design, and governance and (ii) mapping them to potential interventions for multi-agent AI systems. For each mechanism, we propose implementation approaches. We also highlight open challenges, such as the attribution problem (difficulty attributing emergent coordination to specific agents), identity fluidity (agents being easily forked or modified), the boundary problem (distinguishing beneficial cooperation from harmful collusion), and adversarial adaptation (agents learning to evade detection).
翻译:随着多智能体AI系统自主性的不断增强,已有证据表明它们能够发展出与人类市场及机构中长期观察到的合谋策略相似的协同行为。尽管人类领域已积累数百年反合谋机制经验,但这些机制如何适配AI场景仍属未知。本文通过以下两方面填补这一研究空白:(i)构建人类反合谋机制分类体系,涵盖制裁、宽大处理与举报、监控与审计、市场设计及治理机制;(ii)将这些机制映射至面向多智能体AI系统的潜在干预措施。针对每种机制,我们提出具体实施方案,并重点剖析开放挑战,包括归因难题(难以将涌现性协同行为归因于特定智能体)、身份流动性(智能体易被分叉或修改)、边界问题(区分有益合作与有害合谋的界限模糊)以及对抗性适应(智能体学会规避检测)。