We study group decision-making in artificial societies where the rules of play are themselves subject to collective amendment. Using the self-amending game Nomic, we compare multiple scales across two LLM families and find that collective adaptation does not improve monotonically with model size. Instead, both families exhibit a narrow mid-scale regime that supports sustained rule adoption, diverse amendments, and balanced consensus. Smaller models tend to remain rule-inert, whereas larger models often converge on restrictive voting patterns, and heterogeneous mixed-size groups collapse into veto-driven gridlock. These cross-scale contrasts persist under temperature perturbations and under a shift from unanimity to majority voting, although latent-state structure varies by family and scale. Hidden-state divergence alone does not explain collective performance: high representational divergence can coincide with poor behavioural outcomes. Linear probes reveal regime-selective coupling between latent vote-predictive signals and collective behaviour, but decodability is necessary rather than sufficient for adaptive play. Overall, the recurring regularity is non-monotonicity, not the particular scale at which the optimum appears. Self-amending games therefore provide a controlled testbed for studying collective adaptation in artificial societies beyond raw model scale.
翻译:我们研究了人工社会中规则本身可被集体修正的群体决策过程。通过采用自修正游戏Nomic,我们比较了两种LLM家族在多个尺度下的表现,发现集体适应并未随模型规模单调提升。相反,两种家族均呈现出支持持续规则采纳、多样化修正与均衡共识的窄区间中等尺度。小型模型倾向于保持规则惰性,而大型模型常收敛于限制性投票模式,混合规模异构群体则崩溃为否决驱动的僵局。这些跨尺度差异在温度扰动及从全体一致向多数投票转变时依然存在,尽管潜在状态结构因家族和尺度而异。隐藏状态分歧本身无法解释集体性能:高代表性分歧可能与不良行为结果并存。线性探针揭示了潜在投票预测信号与集体行为之间的尺度选择性耦合,但可解码性仅是适应性游戏的必要非充分条件。总体而言,重复出现的规律是非单调性,而非最优出现的特定尺度。因此,自修正游戏为研究超越原始模型尺度的人工社会集体适应提供了可控测试平台。