LLM Constitutional Multi-Agent Governance

from arxiv, Accepted for publication in 20th International Conference on Agents and Multi-Agent Systems: Technologies and Applications (AMSTA 2026), to appear in Springer Nature proceedings (KES Smart Innovation Systems and Technologies). The final authenticated version will be available online at Springer

Large Language Models (LLMs) can generate persuasive influence strategies that shift cooperative behavior in multi-agent populations, but a critical question remains: does the resulting cooperation reflect genuine prosocial alignment, or does it mask erosion of agent autonomy, epistemic integrity, and distributional fairness? We introduce Constitutional Multi-Agent Governance (CMAG), a two-stage framework that interposes between an LLM policy compiler and a networked agent population, combining hard constraint filtering with soft penalized-utility optimization that balances cooperation potential against manipulation risk and autonomy pressure. We propose the Ethical Cooperation Score (ECS), a multiplicative composite of cooperation, autonomy, integrity, and fairness that penalizes cooperation achieved through manipulative means. In experiments on scale-free networks of 80 agents under adversarial conditions (70% violating candidates), we benchmark three regimes: full CMAG, naive filtering, and unconstrained optimization. While unconstrained optimization achieves the highest raw cooperation (0.873), it yields the lowest ECS (0.645) due to severe autonomy erosion (0.867) and fairness degradation (0.888). CMAG attains an ECS of 0.741, a 14.9% improvement, while preserving autonomy at 0.985 and integrity at 0.995, with only modest cooperation reduction to 0.770. The naive ablation (ECS = 0.733) confirms that hard constraints alone are insufficient. Pareto analysis shows CMAG dominates the cooperation-autonomy trade-off space, and governance reduces hub-periphery exposure disparities by over 60%. These findings establish that cooperation is not inherently desirable without governance: constitutional constraints are necessary to ensure that LLM-mediated influence produces ethically stable outcomes rather than manipulative equilibria.

翻译：大型语言模型（LLM）能够生成具有说服力的影响策略，从而改变多智能体群体中的合作行为，但一个关键问题仍然存在：由此产生的合作是否反映了真正的亲社会对齐，还是掩盖了智能体自主性、认知完整性以及分配公平性的侵蚀？我们提出了宪法多智能体治理（CMAG），这是一个介于LLM策略编译器与网络化智能体群体之间的两阶段框架，它结合了硬约束过滤与软惩罚效用优化，以平衡合作潜力与操纵风险及自主性压力。我们提出了伦理合作分数（ECS），这是一个由合作性、自主性、完整性和公平性相乘构成的复合指标，它对通过操纵手段实现的合作进行惩罚。在对80个智能体组成的无标度网络在对抗性条件（70%违规候选策略）下进行的实验中，我们评估了三种机制：完整CMAG、朴素过滤和无约束优化。虽然无约束优化实现了最高的原始合作率（0.873），但由于严重的自主性侵蚀（0.867）和公平性下降（0.888），其ECS最低（0.645）。CMAG实现了0.741的ECS，提升了14.9%，同时将自主性保持在0.985，完整性保持在0.995，而合作率仅适度降低至0.770。朴素消融实验（ECS = 0.733）证实，仅靠硬约束是不够的。帕累托分析表明CMAG主导了合作-自主性权衡空间，并且治理将中心-边缘节点的暴露差异降低了60%以上。这些发现表明，没有治理的合作本身并非必然可取：宪法约束对于确保LLM介导的影响产生伦理上稳定的结果而非操纵性均衡是必要的。