Preserving Disagreement: Architectural Heterogeneity and Coherence Validation in Multi-Agent Policy Simulation

Multi-agent deliberation systems using large language models (LLMs) are increasingly proposed for policy simulation, yet they suffer from artificial consensus: evaluator agents converge on the same option regardless of their assigned value perspectives. We present the AI Council, a three-phase deliberation framework, and conduct 120 deliberations across two policy scenarios to test two interventions. First, architectural heterogeneity (assigning a different 7-9B parameter model to each value perspective) significantly reduces first-choice concentration compared to a homogeneous baseline (child welfare: 70.9% to 46.1%, p < 0.001, r = 0.58; housing: 46.0% to 22.9%, p < 0.001, r = 0.50). This contrasts with accuracy-oriented multi-agent debate, where heterogeneity does not reduce convergence, suggesting model diversity operates differently when no objectively correct answer exists. Second, coherence validation (using a frontier model to assess whether each evaluator's reasoning is grounded in its assigned values) reveals a fidelity-diversity tradeoff: on a scenario with a dominant option, it further reduces concentration (46.1% to 40.8%, p = 0.004), but on a scenario with genuinely competitive options, it increases concentration (22.9% to 26.6%, p = 0.96) by amplifying high-coherence evaluators who cluster on one option. This tradeoff may be a general property of multi-agent systems employing quality weighting. We report negative results from three failed Delphi designs, demonstrate that 8B models exhibit binary rather than graded responses to counter-arguments, and propose the trustworthy tension rate as a diagnostic measure of small-model deliberation capabilities.

翻译：使用大语言模型（LLMs）的多智能体协商系统日益被用于政策模拟，但其存在人为趋同问题：无论被赋予何种价值观视角，评估智能体最终都会收敛于同一选项。我们提出AI委员会这一三阶段协商框架，并针对两个政策场景开展120次协商实验，以检验两种干预措施的效果。其一，架构异质性（为每个价值观视角分配不同的7-9B参数模型）相较于同质基线显著降低了首选项集中度（儿童福利：70.9%降至46.1%，p<0.001，r=0.58；住房：46.0%降至22.9%，p<0.001，r=0.50）。这与面向准确性的多智能体辩论形成对比——后者的异质性并未降低收敛性，表明当不存在客观正确答案时，模型多样性具有不同作用机制。其二，一致性验证（使用前沿模型评估每个评估者的推理是否基于其被赋予的价值观）揭示了保真度-多样性的权衡：在存在主导选项的场景中，该措施进一步降低集中度（46.1%降至40.8%，p=0.004），但在存在真正竞争性选项的场景中，它通过放大聚集于同一选项的高一致性评估者而提高集中度（22.9%升至26.6%，p=0.96）。这种权衡可能是采用质量加权机制的多智能体系统的普遍特性。我们报告了三种失败德尔菲设计的负面结果，证明8B模型对反驳论点呈现二元而非梯度响应，并提出可信紧张率作为衡量小模型协商能力的诊断指标。