A domain adaptation technique has been proposed in this paper to identify the emotions in generic images containing facial & non-facial objects and non-human components. It addresses the challenge of the insufficient availability of pre-trained models and well-annotated datasets for image emotion recognition (IER). It starts with proposing a facial emotion recognition (FER) system and then moves on to adapting it for image emotion recognition. First, a deep-learning-based FER system has been proposed that classifies a given facial image into discrete emotion classes. Further, an image recognition system has been proposed that adapts the proposed FER system to recognize the emotions portrayed by images using domain adaptation. It classifies the generic images into 'happy,' 'sad,' 'hate,' and 'anger' classes. A novel interpretability approach, Divide and Conquer based Shap (DnCShap), has also been proposed to interpret the highly relevant visual features for emotion recognition. The proposed system's architecture has been decided through ablation studies, and the experiments are conducted on four FER and four IER datasets. The proposed IER system has shown an emotion classification accuracy of 59.61% for the IAPSa dataset, 57.83% for the ArtPhoto dataset, 67.93% for the FI dataset, and 55.13% for the EMOTIC dataset. The important visual features leading to a particular emotion class have been identified, and the embedding plots for various emotion classes have been analyzed to explain the proposed system's predictions.
翻译:本文提出了一种域适应技术,用于识别包含面部及非面部对象、非人类成分的通用图像中的情感。该方法解决了图像情感识别中预训练模型及高质量标注数据集不足的挑战。研究首先提出面部情感识别系统,随后将其适配至图像情感识别任务。具体而言,首先构建基于深度学习的FER系统,将输入面部图像分类至离散情感类别;进而提出图像识别系统,通过域适应技术将FER系统迁移至图像情感识别,将通用图像划分为"快乐"、"悲伤"、"憎恨"和"愤怒"四类。本文还提出了一种新型可解释性方法——基于分治策略的Shapley值方法,用于解释情感识别中高度相关的视觉特征。通过消融研究确定系统架构,并在四个FER数据集和四个IER数据集上进行实验。所提IER系统在IAPSa数据集上达到59.61%的情感分类准确率,在ArtPhoto数据集上为57.83%,在FI数据集上为67.93%,在EMOTIC数据集上为55.13%。研究识别出指向特定情感类别的关键视觉特征,并通过分析各类情感嵌入图来解释系统预测结果。