VEATIC: Video-based Emotion and Affect Tracking in Context Dataset

Human affect recognition has been a significant topic in psychophysics and computer vision. However, the currently published datasets have many limitations. For example, most datasets contain frames that contain only information about facial expressions. Due to the limitations of previous datasets, it is very hard to either understand the mechanisms for affect recognition of humans or generalize well on common cases for computer vision models trained on those datasets. In this work, we introduce a brand new large dataset, the Video-based Emotion and Affect Tracking in Context Dataset (VEATIC), that can conquer the limitations of the previous datasets. VEATIC has 124 video clips from Hollywood movies, documentaries, and home videos with continuous valence and arousal ratings of each frame via real-time annotation. Along with the dataset, we propose a new computer vision task to infer the affect of the selected character via both context and character information in each video frame. Additionally, we propose a simple model to benchmark this new computer vision task. We also compare the performance of the pretrained model using our dataset with other similar datasets. Experiments show the competing results of our pretrained model via VEATIC, indicating the generalizability of VEATIC. Our dataset is available at https://veatic.github.io.

翻译：人类情感识别一直是心理物理学与计算机视觉领域的重要课题。然而，当前已发布的数据集存在诸多局限。例如，多数数据集仅包含面部表情信息的帧。受限于现有数据集，既难以理解人类情感识别的内在机制，又难以使基于这些数据集训练的计算机视觉模型在常见场景中具备良好泛化能力。本文提出一个全新的大规模数据集——基于视频的情境化情绪与情感追踪数据集（VEATIC），以突破先前数据集的局限性。VEATIC包含来自好莱坞电影、纪录片及家庭录像的124个视频片段，通过实时标注技术为每一帧提供连续的效价与唤醒度评分。除数据集外，我们提出了一个新的计算机视觉任务——利用每个视频帧中的情境信息与角色信息推断选定角色的情感状态。同时，我们构建了一个简单模型作为该新任务的基准。此外，我们还将基于VEATIC训练的预训练模型与其他同类数据集上的表现进行了对比。实验表明，基于VEATIC的预训练模型取得了具有竞争力的结果，验证了VEATIC的泛化能力。该数据集可通过https://veatic.github.io获取。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日