Autonomous systems increasingly require moral judgment capabilities, yet whether these capabilities scale predictably with model size remains unexplored. We systematically evaluate 75 large language model configurations (0.27B--1000B parameters) using the Moral Machine framework, measuring alignment with human preferences in life-death dilemmas. We observe a consistent power-law relationship with distance from human preferences ($D$) decreasing as $D \propto S^{-0.10\pm0.01}$ ($R^2=0.50$, $p<0.001$) where $S$ is model size. Mixed-effects models confirm this relationship persists after controlling for model family and reasoning capabilities. Extended reasoning models show additional 16\% improvement beyond scale effects. The relationship holds across diverse architectures, while variance decreases at larger scales, indicating systematic emergence of more reliable moral judgment with computational scale. These findings extend scaling law research to value-based judgments and provide empirical foundations for artificial intelligence governance.
翻译:自主系统日益需要道德判断能力,然而这些能力是否随模型规模可预测地扩展仍未被探索。我们使用道德机器框架系统评估了75种大规模语言模型配置(0.27B--1000B参数),测量其在生死困境中与人类偏好的对齐程度。我们观察到与人类偏好距离($D$)存在一致的幂律关系:$D \propto S^{-0.10\pm0.01}$($R^2=0.50$,$p<0.001$),其中$S$为模型规模。混合效应模型证实该关系在控制模型系列和推理能力后依然存在。扩展推理模型显示出超越规模效应的额外16%改进。该关系在不同架构中均成立,而方差在更大规模时减小,表明更可靠的道德判断随计算规模系统性地涌现。这些发现将缩放定律研究延伸至价值判断领域,并为人工智能治理提供了实证基础。