Visual Emotion Analysis (VEA) aims at predicting people's emotional responses to visual stimuli. This is a promising, yet challenging, task in affective computing, which has drawn increasing attention in recent years. Most of the existing work in this area focuses on feature design, while little attention has been paid to dataset construction. In this work, we introduce EmoSet, the first large-scale visual emotion dataset annotated with rich attributes, which is superior to existing datasets in four aspects: scale, annotation richness, diversity, and data balance. EmoSet comprises 3.3 million images in total, with 118,102 of these images carefully labeled by human annotators, making it five times larger than the largest existing dataset. EmoSet includes images from social networks, as well as artistic images, and it is well balanced between different emotion categories. Motivated by psychological studies, in addition to emotion category, each image is also annotated with a set of describable emotion attributes: brightness, colorfulness, scene type, object class, facial expression, and human action, which can help understand visual emotions in a precise and interpretable way. The relevance of these emotion attributes is validated by analyzing the correlations between them and visual emotion, as well as by designing an attribute module to help visual emotion recognition. We believe EmoSet will bring some key insights and encourage further research in visual emotion analysis and understanding. Project page: https://vcc.tech/EmoSet.
翻译:视觉情感分析(VEA)旨在预测人们对视觉刺激的情感反应。这是情感计算中一项充满前景但极具挑战性的任务,近年来受到越来越多的关注。现有研究大多聚焦于特征设计,而对数据集构建的关注较少。本文提出了EmoSet——首个带有丰富属性标注的大规模视觉情感数据集,其在规模、标注丰富度、多样性和数据平衡性四个维度均优于现有数据集。该数据集共包含330万张图像,其中118,102张图像经过人工标注者精心标注,规模是现有最大数据集的五倍。EmoSet既包含来自社交网络的图像,也包含艺术图像,并在不同情感类别之间保持了良好的平衡性。受心理学研究的启发,除情感类别外,每张图像还标注了一系列可描述的情感属性:亮度、色彩丰富度、场景类型、物体类别、面部表情和人体动作,这些属性能够以精确且可解释的方式帮助理解视觉情感。通过分析这些属性与视觉情感之间的相关性,以及设计属性模块辅助视觉情感识别,验证了这些情感属性的有效性。我们相信EmoSet将为视觉情感分析与理解领域带来关键见解,并推动进一步研究。项目主页:https://vcc.tech/EmoSet。