VEATIC: Video-based Emotion and Affect Tracking in Context Dataset

Human affect recognition has been a significant topic in psychophysics and computer vision. However, the currently published datasets have many limitations. For example, most datasets contain frames that contain only information about facial expressions. Due to the limitations of previous datasets, it is very hard to either understand the mechanisms for affect recognition of humans or generalize well on common cases for computer vision models trained on those datasets. In this work, we introduce a brand new large dataset, the Video-based Emotion and Affect Tracking in Context Dataset (VEATIC), that can conquer the limitations of the previous datasets. VEATIC has 124 video clips from Hollywood movies, documentaries, and home videos with continuous valence and arousal ratings of each frame via real-time annotation. Along with the dataset, we propose a new computer vision task to infer the affect of the selected character via both context and character information in each video frame. Additionally, we propose a simple model to benchmark this new computer vision task. We also compare the performance of the pretrained model using our dataset with other similar datasets. Experiments show the competing results of our pretrained model via VEATIC, indicating the generalizability of VEATIC. Our dataset is available at https://veatic.github.io.

翻译：人类情感识别一直是心理物理学和计算机视觉领域的重要课题。然而，目前公开的数据集存在诸多局限性。例如，大多数数据集仅包含面部表情信息的帧。由于先前数据集的局限，无论是理解人类情感识别的机制，还是使基于这些数据集训练的计算机视觉模型在常见场景中具有良好的泛化能力，都极为困难。本研究引入了一个全新的大规模数据集——基于视频的情境化情感与情绪追踪数据集（VEATIC），可克服先前数据集的局限性。VEATIC包含来自好莱坞电影、纪录片及家庭录像的124个视频片段，通过实时标注为每一帧提供连续的效价和唤醒度评分。伴随该数据集，我们提出了一项新的计算机视觉任务：通过每帧视频中的情境信息和角色信息推断选定角色的情感状态。此外，我们提出了一个简单模型作为该新任务的基准。我们还对比了使用本数据集预训练的模型与其他类似数据集的性能表现。实验表明，基于VEATIC预训练的模型取得了具有竞争力的结果，验证了VEATIC的泛化能力。本数据集可通过https://veatic.github.io获取。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日