Human affect recognition has been a significant topic in psychophysics and computer vision. However, the currently published datasets have many limitations. For example, most datasets contain frames that contain only information about facial expressions. Due to the limitations of previous datasets, it is very hard to either understand the mechanisms for affect recognition of humans or generalize well on common cases for computer vision models trained on those datasets. In this work, we introduce a brand new large dataset, the Video-based Emotion and Affect Tracking in Context Dataset (VEATIC), that can conquer the limitations of the previous datasets. VEATIC has 124 video clips from Hollywood movies, documentaries, and home videos with continuous valence and arousal ratings of each frame via real-time annotation. Along with the dataset, we propose a new computer vision task to infer the affect of the selected character via both context and character information in each video frame. Additionally, we propose a simple model to benchmark this new computer vision task. We also compare the performance of the pretrained model using our dataset with other similar datasets. Experiments show the competing results of our pretrained model via VEATIC, indicating the generalizability of VEATIC. Our dataset is available at https://veatic.github.io.
翻译:人类情感识别一直是心理物理学与计算机视觉领域的重要课题。然而,当前已发布的数据集存在诸多局限。例如,多数数据集仅包含面部表情信息的帧。受限于现有数据集,既难以理解人类情感识别的内在机制,又难以使基于这些数据集训练的计算机视觉模型在常见场景中具备良好泛化能力。本文提出一个全新的大规模数据集——基于视频的情境化情绪与情感追踪数据集(VEATIC),以突破先前数据集的局限性。VEATIC包含来自好莱坞电影、纪录片及家庭录像的124个视频片段,通过实时标注技术为每一帧提供连续的效价与唤醒度评分。除数据集外,我们提出了一个新的计算机视觉任务——利用每个视频帧中的情境信息与角色信息推断选定角色的情感状态。同时,我们构建了一个简单模型作为该新任务的基准。此外,我们还将基于VEATIC训练的预训练模型与其他同类数据集上的表现进行了对比。实验表明,基于VEATIC的预训练模型取得了具有竞争力的结果,验证了VEATIC的泛化能力。该数据集可通过https://veatic.github.io获取。