Autonomous systems increasingly require moral judgment capabilities, yet whether these capabilities scale predictably with model size remains unexplored. We systematically evaluate 75 large language model configurations (0.27B--1000B parameters) using the Moral Machine framework, measuring alignment with human preferences in life-death dilemmas. We observe a consistent power-law relationship with distance from human preferences ($D$) decreasing as $D \propto S^{-0.10\pm0.01}$ ($R^2=0.50$, $p<0.001$) where $S$ is model size. Mixed-effects models confirm this relationship persists after controlling for model family and reasoning capabilities. Extended reasoning models show significantly better alignment, with this effect being more pronounced in smaller models (size$\times$reasoning interaction: $p = 0.024$). The relationship holds across diverse architectures, while variance decreases at larger scales, indicating systematic emergence of more reliable moral judgment with computational scale. These findings extend scaling law research to value-based judgments and provide empirical foundations for artificial intelligence governance.
翻译:自主系统日益需要道德判断能力,然而这些能力是否会随模型规模以可预测方式提升仍属未知。我们采用道德机器框架,系统评估了75个大语言模型配置(参数规模0.27B-1000B),测量其在生死困境中与人类偏好的对齐程度。研究发现模型规模与人类偏好距离(D)之间存在一致的幂律关系:D ∝ S^{-0.10±0.01}(R²=0.50,p<0.001),其中S为模型规模。混合效应模型证实,在控制模型家族与推理能力后该关系依然成立。扩展推理模型展现出显著更优的对齐效果,且该效应在较小模型中更为显著(规模×推理交互作用:p=0.024)。该关系在不同架构中均成立,同时随规模增大方差递减,表明计算规模可系统性催生更可靠的道德判断。这些发现将缩放定律研究拓展至价值判断领域,为人工智能治理提供实证基础。