Understanding affect is central to anticipating human behavior, yet current egocentric vision benchmarks largely ignore the person's emotional states that shape their decisions and actions. Existing tasks in egocentric perception focus on physical activities, hand-object interactions, and attention modeling - assuming neutral affect and uniform personality. This limits the ability of vision systems to capture key internal drivers of behavior. In this paper, we present egoEMOTION, the first dataset that couples egocentric visual and physiological signals with dense self-reports of emotion and personality across controlled and real-world scenarios. Our dataset includes over 50 hours of recordings from 43 participants, captured using Meta's Project Aria glasses. Each session provides synchronized eye-tracking video, headmounted photoplethysmography, inertial motion data, and physiological baselines for reference. Participants completed emotion-elicitation tasks and naturalistic activities while self-reporting their affective state using the Circumplex Model and Mikels' Wheel as well as their personality via the Big Five model. We define three benchmark tasks: (1) continuous affect classification (valence, arousal, dominance); (2) discrete emotion classification; and (3) trait-level personality inference. We show that a classical learning-based method, as a simple baseline in real-world affect prediction, produces better estimates from signals captured on egocentric vision systems than processing physiological signals. Our dataset establishes emotion and personality as core dimensions in egocentric perception and opens new directions in affect-driven modeling of behavior, intent, and interaction.
翻译:理解情感是预测人类行为的关键,然而当前的第一人称视觉基准数据集大多忽略了影响个体决策与行动的情绪状态。现有的第一人称感知任务主要关注物理活动、手物交互和注意力建模——均假设情感中性且人格均一。这限制了视觉系统捕捉行为关键内在驱动因素的能力。本文提出egoEMOTION,这是首个在受控场景与真实场景中,将第一人称视觉及生理信号与密集自我报告的情绪及人格数据相结合的数据集。我们的数据集包含来自43名参与者的超过50小时记录,使用Meta公司的Project Aria眼镜采集。每个会话提供同步的眼动追踪视频、头戴式光电容积描记、惯性运动数据以及参考用的生理基线。参与者在完成情绪诱发任务和自然活动的同时,使用环状模型和Mikels情绪轮进行情感状态自我报告,并通过大五人格模型报告人格特质。我们定义了三个基准任务:(1)连续情感分类(效价、唤醒度、支配度);(2)离散情绪分类;(3)特质层面的人格推断。研究表明,在现实世界情感预测中,作为一种简单基线的经典学习方法,通过处理第一人称视觉系统捕获的信号比处理生理信号能产生更优的估计结果。本数据集确立了情绪与人格作为第一人称感知的核心维度,并为行为、意图与交互的情感驱动建模开辟了新方向。