Human affect recognition has been a significant topic in psychophysics and computer vision. However, the currently published datasets have many limitations. For example, most datasets contain frames that contain only information about facial expressions. Due to the limitations of previous datasets, it is very hard to either understand the mechanisms for affect recognition of humans or generalize well on common cases for computer vision models trained on those datasets. In this work, we introduce a brand new large dataset, the Video-based Emotion and Affect Tracking in Context Dataset (VEATIC), that can conquer the limitations of the previous datasets. VEATIC has 124 video clips from Hollywood movies, documentaries, and home videos with continuous valence and arousal ratings of each frame via real-time annotation. Along with the dataset, we propose a new computer vision task to infer the affect of the selected character via both context and character information in each video frame. Additionally, we propose a simple model to benchmark this new computer vision task. We also compare the performance of the pretrained model using our dataset with other similar datasets. Experiments show the competing results of our pretrained model via VEATIC, indicating the generalizability of VEATIC. Our dataset is available at https://veatic.github.io.
翻译:人类情感识别一直是心理物理学和计算机视觉领域的重要课题。然而,目前公开的数据集存在诸多局限性。例如,大多数数据集仅包含面部表情信息的帧。由于先前数据集的局限,无论是理解人类情感识别的机制,还是使基于这些数据集训练的计算机视觉模型在常见场景中具有良好的泛化能力,都极为困难。本研究引入了一个全新的大规模数据集——基于视频的情境化情感与情绪追踪数据集(VEATIC),可克服先前数据集的局限性。VEATIC包含来自好莱坞电影、纪录片及家庭录像的124个视频片段,通过实时标注为每一帧提供连续的效价和唤醒度评分。伴随该数据集,我们提出了一项新的计算机视觉任务:通过每帧视频中的情境信息和角色信息推断选定角色的情感状态。此外,我们提出了一个简单模型作为该新任务的基准。我们还对比了使用本数据集预训练的模型与其他类似数据集的性能表现。实验表明,基于VEATIC预训练的模型取得了具有竞争力的结果,验证了VEATIC的泛化能力。本数据集可通过https://veatic.github.io获取。