Human commonsense understanding of the physical and social world is organized around intuitive theories. These theories support making causal and moral judgments. When something bad happens, we naturally ask: who did what, and why? A rich literature in cognitive science has studied people's causal and moral intuitions. This work has revealed a number of factors that systematically influence people's judgments, such as the violation of norms and whether the harm is avoidable or inevitable. We collected a dataset of stories from 24 cognitive science papers and developed a system to annotate each story with the factors they investigated. Using this dataset, we test whether large language models (LLMs) make causal and moral judgments about text-based scenarios that align with those of human participants. On the aggregate level, alignment has improved with more recent LLMs. However, using statistical analyses, we find that LLMs weigh the different factors quite differently from human participants. These results show how curated, challenge datasets combined with insights from cognitive science can help us go beyond comparisons based merely on aggregate metrics: we uncover LLMs implicit tendencies and show to what extent these align with human intuitions.
翻译:摘要:人类对物理与社会世界的常识理解围绕直觉理论展开。这些理论支持做出因果与道德判断。当坏事发生时,我们自然会问:谁做了什么,为何如此?认知科学领域已有大量文献研究人类的因果与道德直觉。这些研究揭示了系统影响人类判断的若干因素,例如规范违背、伤害的可避免性或必然性。我们从24篇认知科学论文中收集了一个故事数据集,并开发了一套系统,为每个故事标注研究所涉及的因素。利用该数据集,我们检验大语言模型(LLMs)对基于文本的场景做出的因果与道德判断是否与人类参与者一致。在总体层面上,较新的LLMs在对齐程度上有所提升。然而,通过统计分析,我们发现LLMs对不同因素的权重分配与人类参与者存在显著差异。这些结果表明,精心设计的挑战性数据集结合认知科学的洞见,可以帮助我们超越仅基于总体指标的对比:我们揭示了LLMs隐含的倾向,并展示了这些倾向在多大程度上与人类直觉对齐。