Recently, GPT-4 with Vision (GPT-4V) has demonstrated remarkable visual capabilities across various tasks, but its performance in emotion recognition has not been fully evaluated. To bridge this gap, we present the quantitative evaluation results of GPT-4V on 21 benchmark datasets covering 6 tasks: visual sentiment analysis, tweet sentiment analysis, micro-expression recognition, facial emotion recognition, dynamic facial emotion recognition, and multimodal emotion recognition. This paper collectively refers to these tasks as ``Generalized Emotion Recognition (GER)''. Through experimental analysis, we observe that GPT-4V exhibits strong visual understanding capabilities in GER tasks. Meanwhile, GPT-4V shows the ability to integrate multimodal clues and exploit temporal information, which is also critical for emotion recognition. However, it's worth noting that GPT-4V is primarily designed for general domains and cannot recognize micro-expressions that require specialized knowledge. To the best of our knowledge, this paper provides the first quantitative assessment of GPT-4V for GER tasks. We have open-sourced the code and encourage subsequent researchers to broaden the evaluation scope by including more tasks and datasets. Our code and evaluation results are available at: https://github.com/zeroQiaoba/gpt4v-emotion.
翻译:近期,带视觉能力的GPT-4(GPT-4V)已在多种视觉任务中展现出卓越性能,但其在情感识别方面的表现尚未得到充分评估。为填补这一空白,本文在涵盖6项任务的21个基准数据集上呈现了GPT-4V的定量评估结果,这6项任务包括:视觉情感分析、推文情感分析、微表情识别、面部表情识别、动态面部表情识别及多模态情感识别。本文将上述任务统称为"通用情感识别(GER)"。通过实验分析,我们观察到GPT-4V在GER任务中展现出强大的视觉理解能力。同时,GPT-4V具备整合多模态线索与利用时序信息的能力,这对情感识别同样至关重要。然而值得注意的是,GPT-4V主要面向通用领域设计,无法识别需要专业知识的微表情。据我们所知,本文首次针对GER任务对GPT-4V进行了定量评估。我们已开源相关代码,并鼓励后续研究者通过纳入更多任务与数据集来拓展评估范围。代码与评估结果见:https://github.com/zeroQiaoba/gpt4v-emotion。