For a long time, images have proved perfect at both storing and conveying rich semantics, especially human emotions. A lot of research has been conducted to provide machines with the ability to recognize emotions in photos of people. Previous methods mostly focus on facial expressions but fail to consider the scene context, meanwhile scene context plays an important role in predicting emotions, leading to more accurate results. In addition, Valence-Arousal-Dominance (VAD) values offer a more precise quantitative understanding of continuous emotions, yet there has been less emphasis on predicting them compared to discrete emotional categories. In this paper, we present a novel Multi-Branch Network (MBN), which utilizes various source information, including faces, bodies, and scene contexts to predict both discrete and continuous emotions in an image. Experimental results on EMOTIC dataset, which contains large-scale images of people in unconstrained situations labeled with 26 discrete categories of emotions and VAD values, show that our proposed method significantly outperforms state-of-the-art methods with 28.4% in mAP and 0.93 in MAE. The results highlight the importance of utilizing multiple contextual information in emotion prediction and illustrate the potential of our proposed method in a wide range of applications, such as effective computing, human-computer interaction, and social robotics. Source code: https://github.com/BaoNinh2808/Multi-Branch-Network-for-Imagery-Emotion-Prediction
翻译:长期以来,图像在存储和传达丰富语义(尤其是人类情感)方面表现出色。大量研究致力于赋予机器识别照片中人物情感的能力。以往方法主要关注面部表情,但忽略了场景上下文,而场景上下文在情感预测中扮演重要角色,能带来更准确的结果。此外,效价-唤醒-支配(VAD)值为连续情感提供了更精确的量化理解,但与离散情感类别相比,对其预测的重视程度较低。本文提出了一种新颖的多分支网络(MBN),该网络利用包括面部、身体和场景上下文在内的多种源信息,来预测图像中的离散情感和连续情感。在包含大规模非约束环境下人物图像的EMOTIC数据集(标注了26个离散情感类别和VAD值)上的实验结果表明,我们的方法在mAP上达到28.4%、MAE上达到0.93,显著优于最先进的方法。这些结果突显了在情感预测中利用多种上下文信息的重要性,并展示了所提方法在情感计算、人机交互和社交机器人等广泛领域的潜力。源代码:https://github.com/BaoNinh2808/Multi-Branch-Network-for-Imagery-Emotion-Prediction