Understanding emotions is fundamental to human interaction and experience. Humans easily infer emotions from situations or facial expressions, situations from emotions, and do a variety of other affective cognition. How adept is modern AI at these inferences? We introduce an evaluation framework for testing affective cognition in foundation models. Starting from psychological theory, we generate 1,280 diverse scenarios exploring relationships between appraisals, emotions, expressions, and outcomes. We evaluate the abilities of foundation models (GPT-4, Claude-3, Gemini-1.5-Pro) and humans (N = 567) across carefully selected conditions. Our results show foundation models tend to agree with human intuitions, matching or exceeding interparticipant agreement. In some conditions, models are ``superhuman'' -- they better predict modal human judgements than the average human. All models benefit from chain-of-thought reasoning. This suggests foundation models have acquired a human-like understanding of emotions and their influence on beliefs and behavior.
翻译:理解情感是人类互动与体验的基础。人类能够轻松地从情境或面部表情推断情感,从情感推断情境,并进行多种其他情感认知。现代人工智能在这些推理任务上的表现如何?我们提出了一个用于测试基础模型中情感认知能力的评估框架。基于心理学理论,我们生成了1,280个多样化场景,探索评价、情绪、表情和结果之间的关系。我们在精心选择的条件下评估了基础模型(GPT-4、Claude-3、Gemini-1.5-Pro)和人类(N = 567)的能力。结果显示,基础模型倾向于与人类直觉一致,其表现达到或超过了参与者间的一致性水平。在某些条件下,模型表现出“超人类”能力——它们比普通人类更能预测主流的人类判断。所有模型都受益于思维链推理。这表明基础模型已经获得了对人类情感及其对信念和行为影响的人类式理解。