Eliciting cooperation in multi-agent LLM systems is critical for AI alignment. We investigate two approaches: direct communication and curriculum learning. In a 4-player Stag Hunt, a one-word "cheap talk" channel increases cooperation from 0% to 48.3%, demonstrating communication as a robust coordination mechanism. In contrast, we find that curriculum learning is highly sensitive to design choices: our pedagogical curriculum through progressively complex games reduced agent payoffs by 27.4% in an Iterated Public Goods Game with Punishment, demonstrating that optimizing for short-term rationality can actively undermine alignment goals. Qualitative analysis reveals that curricula emphasizing defection-equilibrium games can induce "learned pessimism" in agents. These findings suggest that for coordination problems, simple communication protocols may be more reliable than experience-based training, and that curriculum design for social dilemmas requires careful attention to the strategic lessons embedded in game sequences.
翻译:在多智能体LLM系统中激发协作是实现AI对齐的关键。本研究探讨两种方法:直接通信与课程学习。在4参与者猎鹿博弈中,仅增加单词语义的低成本通信通道将合作率从0%提升至48.3%,证明通信是有效的协调机制。相比之下,我们发现课程学习对设计选择高度敏感:通过渐进复杂博弈设计的教学课程,在带惩罚机制的迭代公共物品博弈中使智能体收益降低27.4%,这表明针对短期理性的优化可能直接破坏对齐目标。定性分析显示,强调背叛均衡博弈的课程会诱发智能体的"习得性悲观"。这些发现表明,对于协调问题,简单的通信协议可能比基于经验的训练更可靠,且针对社会困境的课程设计需审慎考量博弈序列中嵌入的策略性经验。