Affective reactions have deep biological foundations, however in humans the development of emotion concepts is also shaped by language and higher-order cognition. A recent breakthrough in AI has been the creation of multimodal language models that exhibit impressive intellectual capabilities, but their responses to affective stimuli have not been investigated. Here we study whether state-of-the-art multimodal systems can emulate human emotional ratings on a standardized set of images, in terms of affective dimensions and basic discrete emotions. The AI judgements correlate surprisingly well with the average human ratings: given that these systems were not explicitly trained to match human affective reactions, this suggests that the ability to visually judge emotional content can emerge from statistical learning over large-scale databases of images paired with linguistic descriptions. Besides showing that language can support the development of rich emotion concepts in AI, these findings have broad implications for sensitive use of multimodal AI technology.
翻译:情感反应具有深厚的生物学基础,然而在人类中,情绪概念的发展也受到语言和高级认知的塑造。人工智能领域的一项最新突破是创建了多模态语言模型,这些模型展现出令人瞩目的智能能力,但其对情感刺激的反应尚未得到研究。本文探讨了最先进的多模态系统是否能够模拟人类对标准化图像集在情感维度和基本离散情绪方面的评分。人工智能的判断与人类平均评分显示出惊人的相关性:考虑到这些系统并未经过明确训练以匹配人类情感反应,这表明对情感内容进行视觉判断的能力可能源于对大规模图像与语言描述配对数据库的统计学习。除了证明语言能够支持人工智能中丰富情绪概念的发展外,这些发现对多模态人工智能技术的敏感应用具有广泛意义。