Emotion recognition is the task of classifying perceived emotions in people. Previous works have utilized various nonverbal cues to extract features from images and correlate them to emotions. Of these cues, situational context is particularly crucial in emotion perception since it can directly influence the emotion of a person. In this paper, we propose an approach for high-level context representation extraction from images. The model relies on a single cue and a single encoding stream to correlate this representation with emotions. Our model competes with the state-of-the-art, achieving an mAP of 0.3002 on the EMOTIC dataset while also being capable of execution on consumer-grade hardware at approximately 90 frames per second. Overall, our approach is more efficient than previous models and can be easily deployed to address real-world problems related to emotion recognition.
翻译:情感识别是对图像中人物感知情感进行分类的任务。先前的研究利用多种非语言线索从图像中提取特征并将其与情感相关联。在这些线索中,情境背景尤为重要,因为它能直接影响人物的情感。本文提出一种从图像中提取高层情境表征的方法。该模型仅依赖单一线索和单一编码流,将这种表征与情感关联起来。我们的模型达到了与最先进技术相媲美的水平,在EMOTIC数据集上取得了0.3002的平均精度(mAP),同时能在消费级硬件上以约每秒90帧的速度执行。总体而言,我们的方法比先前模型更高效,可轻松部署于解决与情感识别相关的实际问题。