This paper investigates the emotional reasoning abilities of the GPT family of large language models via a component perspective. The paper first examines how the model reasons about autobiographical memories. Second, it systematically varies aspects of situations to impact emotion intensity and coping tendencies. Even without the use of prompt engineering, it is shown that GPT's predictions align significantly with human-provided appraisals and emotional labels. However, GPT faces difficulties predicting emotion intensity and coping responses. GPT-4 showed the highest performance in the initial study but fell short in the second, despite providing superior results after minor prompt engineering. This assessment brings up questions on how to effectively employ the strong points and address the weak areas of these models, particularly concerning response variability. These studies underscore the merits of evaluating models from a componential perspective.
翻译:本文从成分视角出发,探究了GPT系列大语言模型的情感推理能力。首先,论文考察了模型如何对自传体记忆进行推理;其次,系统性地改变情境的多个方面,以影响情感强度与应对倾向。研究表明,即使不使用提示工程,GPT的预测结果也与人类提供的评估和情感标签高度一致。然而,GPT在预测情感强度和应对反应方面仍面临困难。GPT-4在初始研究中表现最佳,但在第二项研究中表现欠佳,尽管在简单的提示工程后取得了更优结果。这一评估引发了如何有效利用这些模型的优势并解决其不足(特别是反应变异性问题)的思考。这些研究凸显了从成分视角评估模型的优点。